Raw Record Source

{
  "$type": "site.standard.document",
  "bskyPostRef": {
    "cid": "bafyreifcdhhz65r6czn2o5bn2aw3xacs3iafjxa6uybphmudzcucpykwr4",
    "uri": "at://did:plc:25rdn5elo5izoxrmtis34zuk/app.bsky.feed.post/3moh6hlim6yc2"
  },
  "coverImage": {
    "$type": "blob",
    "ref": {
      "$link": "bafkreid3ihyiximo7to4alurm22cfysoufsxg3pni6nfrcdrckq7u5ltx4"
    },
    "mimeType": "image/webp",
    "size": 351820
  },
  "path": "/genildocs/the-data-refinery-how-json-quietly-became-the-language-ai-agents-speak-mc1",
  "publishedAt": "2026-06-17T00:57:10.000Z",
  "site": "https://dev.to",
  "tags": [
    "ai",
    "webdev",
    "programming",
    "productivity",
    "blense.fun/en"
  ],
  "textContent": "_Every tool call, every structured output, every agent decision travels as JSON. Here is the serialization knowledge that separates the amateur from the architect — now that the stakes have never been higher._\n\nA developer ships an AI agent on a Friday. In the demo it's flawless: the model reads a request, calls a tool, returns a clean answer the app renders perfectly.\n\nA week later, production dashboards are full of garbage. A date is showing up as raw text. A field that was definitely there is silently gone. Under one big payload, the whole server froze for two seconds. And here's the maddening part — **nothing threw an error.** The model returned JSON. The code parsed it. Everything \"worked.\"\n\nThe bug wasn't in the model, and it wasn't in the parser. It lived in the narrow gap between _text_ and _data_ — the place every JSON value has to cross twice. That gap is **serialization** , and in 2026 it has quietly become one of the most important things a JavaScript engineer can actually understand.\n\nWhy now? Because the most important conversations in modern software aren't between humans anymore. They're between models and machines — an LLM deciding which tool to call, a server answering, an agent chaining ten steps together. And every one of those conversations happens in the same format: JSON.\n\nSo let's open up the refinery and see how raw structure becomes a clean stream of bytes — and back again — without losing anything precious on the way.\n\n##  JSON is not a JavaScript object\n\nThis is the misunderstanding that creates most JSON bugs, so it's worth saying plainly: **JSON only looks like a JavaScript object. It isn't one.**\n\nJSON is a _transport format_ — flat, inert text meant to travel across a network or sit on a disk. A JavaScript object is a _live structure_ in memory that your application can read, mutate, and call methods on. They resemble each other the way a flat-packed cardboard box resembles assembled furniture: same thing in spirit, completely different states.\n\n\n\n    const user = { name: \"Joao\" };   // a live object in memory\n    typeof user;                     // \"object\"\n\n    const text = JSON.stringify(user); // '{\"name\":\"Joao\"}' — just characters\n    typeof text;                       // \"string\"\n\n\nThe V8 engine has to do active work to move between these two worlds. Until you parse it, `{\"name\":\"Joao\"}` is no more \"an object\" than the word _cake_ is something you can eat. Hold on to that mental model — everything below is just the two machines that cross the gap: one that packs, one that unpacks.\n\n##  Packing the container: `JSON.stringify` and the serialization funnel\n\n`JSON.stringify` walks the enumerable properties of a value and compresses them into a single JSON string for travel over the network or to disk. But it is not a neutral photocopier. Think of it as a funnel with three filters, and knowing what each filter does is what saves you at 2am.\n\n**Filter 1 — types that pass through cleanly:** strings, numbers, booleans, arrays, and plain objects survive untouched.\n\n**Filter 2 — types that get quietly transformed:** a `Date` is converted to an ISO 8601 string; `NaN` and `Infinity` are turned into `null`.\n\n**Filter 3 — types that are dropped entirely:** functions, `undefined`, and symbols simply vanish from the output.\n\n\n\n    const data = {\n      name: \"Ana\",\n      createdAt: new Date(), // becomes an ISO string\n      balance: Infinity,     // becomes null\n      greet: () => \"hi\",     // dropped (function)\n      nickname: undefined    // dropped (undefined)\n    };\n\n    JSON.stringify(data);\n    // '{\"name\":\"Ana\",\"createdAt\":\"2026-06-16T...Z\",\"balance\":null}'\n\n\nRead that output again. Three of the five fields changed or disappeared, and the engine didn't say a word. That silence is the whole danger.\n\n###  The one rule JSON refuses to break: no cycles\n\nA JSON structure can nest as deeply as you like, but it must be strictly _acyclic_. The engine tracks the stack of objects it's walking; the moment it meets the same object twice, it aborts hard.\n\n\n\n    const a = {};\n    a.self = a;            // a points back at itself\n    JSON.stringify(a);\n    // TypeError: Converting circular structure to JSON\n\n\nThis is one of the rare cases where JSON fails _loudly_ instead of silently — and you should be grateful for it.\n\n###  The filtering agent: the `replacer`\n\nThe second argument to `JSON.stringify` is a `replacer` — a surgical interception that runs _during_ packing. It lets you mutate values or strip sensitive data before it ever reaches the wire. The classic use is redacting secrets:\n\n\n\n    const user = { name: \"Joao\", password: \"123\", admin: true };\n\n    JSON.stringify(user, (key, value) =>\n      key === \"password\" ? undefined : value\n    );\n    // '{\"name\":\"Joao\",\"admin\":true}'\n\n\nReturn `undefined` from the replacer and the key is deleted from the payload. It's the cleanest place to make sure a password never leaves the building.\n\n###  Formatting and delegation: `space` and `toJSON`\n\nTwo more levers are worth knowing. The third argument, `space`, injects whitespace — trading network efficiency for human readability when you're debugging. And any object can define a `toJSON()` method to dictate its own serialization; the engine _always_ delegates to it when present.\n\n\n\n    const account = {\n      id: 42,\n      secret: \"s3cr3t\",\n      toJSON() { return { id: this.id }; } // dictate your own shape\n    };\n\n    JSON.stringify(account); // '{\"id\":42}' — secret never serialized\n\n\n##  Unpacking the container: `JSON.parse` and rehydration\n\nOn the way back, `JSON.parse` reconstructs ECMAScript values from the text, rebuilding the hierarchy strictly from the syntax in the string. But remember Filter 2: serialization _erased types_. That `Date` you sent is now just a string, and parsing alone won't bring it back to life.\n\nThat's what the `reviver` — the second argument to `parse` — is for. It intercepts parsing node by node, letting you **rehydrate** flat strings back into rich instances.\n\n\n\n    const text = '{\"event\":\"deploy\",\"when\":\"2026-06-16T10:30:00Z\"}';\n\n    const obj = JSON.parse(text, (key, value) =>\n      key === \"when\" ? new Date(value) : value\n    );\n\n    obj.when instanceof Date; // true — revived\n\n\nSerialization is lossy by design; the reviver is how you choose what to restore on the other side.\n\n##  Two agents, one job: `replacer` vs. `reviver`\n\nThese two hooks are mirror images, and confusing them is a common source of bugs. Here's the clean comparison:\n\n| `replacer` | `reviver`\n---|---|---\n**Runs during** | Serialization (`stringify`) | After parsing (`parse`)\n**Receives** | The original in-memory value | The freshly parsed string/literal\n**Main use** | Omit secrets, filter payloads | Restore classes (e.g. `Date`)\n**Delete a value by** | Returning `undefined` | Returning `undefined`\n\n##  The modern twist: stop cloning with JSON\n\nHere's a trick almost every JavaScript developer has reached for: deep-cloning an object with `JSON.parse(JSON.stringify(obj))`. It's clever, it's one line — and it's a silent killer, because it runs your data through the entire funnel above.\n\n\n\n    const original = {\n      date: new Date(),\n      tags: new Set([\"a\", \"b\"]),\n      meta: { level: 42 }\n    };\n\n    // The \"classic\" hack — loses the Date, destroys the Set\n    const bad = JSON.parse(JSON.stringify(original));\n    bad.date;  // \"2026-...\" (a string!)\n    bad.tags;  // {} (empty object!)\n\n\nDates become strings, `undefined` disappears, `Map` and `Set` collapse into empty objects, functions are gone, and a circular reference throws. The fix has been native since 2022: **`structuredClone()`** , built on the same Structured Clone Algorithm the platform already uses internally for `postMessage` and IndexedDB.\n\n\n\n    const good = structuredClone(original);\n    good.date; // a real Date\n    good.tags; // Set(2) { \"a\", \"b\" }\n\n\n`structuredClone` preserves circular references, `Map`, `Set`, typed arrays, and `Date`; it keeps `undefined`; it's roughly 20–30% slower but trades that for data integrity; and it adds zero bytes to your bundle (goodbye, Lodash's `cloneDeep`). It throws on functions and DOM nodes — which, honestly, is a feature. If you're cloning a function, your data model is trying to tell you something.\n\n##  JSON as the blueprint of the architecture\n\nStep back from the two functions and you'll notice something: JSON isn't just _data_ flowing through your app. In the Node ecosystem, it's the **declarative blueprint** the whole architecture is built on.\n\nOpen any `package.json` and you're reading a JSON object that controls everything: `main` is the entry point, `scripts` are your automation triggers (`start`, `test`, `build`), `dependencies` define the module tree npm assembles, and `private: true` is a safety lock against accidental publishing. Configuration follows the same instinct — critical values like passwords and URLs don't live in source code; the common pattern is to unify `process.env` into centralized config objects that switch behavior between development and production.\n\nAnd this is where a genuinely modern upgrade lands. For years, importing a JSON config meant a bundler or a `fetch()`. As of ES2025 (baseline across modern runtimes since April 2025), you can import JSON natively with an **import attribute** :\n\n\n\n    // Native JSON import — no bundler, no fetch\n    import config from \"./config.json\" with { type: \"json\" };\n\n    console.log(config.apiUrl);\n\n\nThat `with { type: \"json\" }` is not decoration — it's a **security contract**. It forces the runtime to verify the file is genuinely JSON (via its MIME type) before processing it, which prevents a server from sneaking executable JavaScript in through a file that merely _looks_ like data. JSON modules can't run code; they're pure data, and only ever expose a default export. The platform turned a workaround into a guarantee.\n\n##  The HTTP frontier: where naïve parsing breaks the event loop\n\nNow the hard part. Real-time applications don't receive tidy, complete JSON documents — they receive data **flowing in streams** over HTTP, arriving in fragments. Call the native `JSON.parse` naïvely on a half-arrived network buffer and you get one of two bad outcomes: a syntax error on incomplete data, or — worse — a blocked single-threaded event loop while a huge payload is parsed synchronously, freezing the entire server for every other user.\n\nThe architecture demands a specialized intermediary. In Express, that's the `express.json()` middleware — the inspection conveyor on the assembly line. It buffers the incoming stream safely, checks the `Content-Type: application/json` header, parses the result, and hands your route a ready-to-use `req.body`.\n\n\n\n    const express = require(\"express\");\n    const app = express();\n\n    app.use(express.json()); // the inspection conveyor\n\n    app.post(\"/api/users\", (req, res) => {\n      // req.body is already an object: stream buffered, validated, parsed\n      console.log(req.body.name);\n      res.status(201).json({ ok: true });\n    });\n\n\nThe distinction between the native function and the middleware is the distinction between a script and a system:\n\n| `JSON.parse()` | `express.json()`\n---|---|---\n**Execution context** | Synchronous memory (data already in V8) | HTTP network layer (buffers/streams)\n**Invalid data** | Throws `SyntaxError`, aborts execution | Returns a clean HTTP 400, keeps running\n**Scalability** | Low — blocks the event loop on huge payloads | High — manages payload limits and concurrency\n\n##  The payoff: why all of this now runs the AI era\n\nEverything above used to be \"good Node hygiene.\" In 2026 it's something bigger, because of one structural fact: **LLMs are text generators, and your systems need data structures.** JSON is the bridge between them — and, as we've seen, the bridge is exactly where bugs live.\n\nThat gap is now formalized into three levels of reliability, and knowing which one you're on is the difference between a demo and production:\n\n  * **Level 1 — Prompt engineering.** \"Return JSON with these fields.\" Works 80–95% of the time, fails silently on edge cases, gives you zero type guarantees.\n  * **Level 2 — Function / tool calling.** The model \"calls\" a function whose schema you defined. Works 95–99% of the time, but the schema is a _hint_ , not a constraint — you can still get valid types with invalid values.\n  * **Level 3 — Native structured output.** Constrained decoding against a JSON Schema, using a finite-state machine to mask invalid tokens at generation time. Schema-valid 100% of the time — types _and_ values enforced as the text is produced.\n\n\n\nThis isn't fringe tooling. Native structured output now ships across OpenAI (since August 2024), Google Gemini (2024, expanded through 2026), Anthropic (beta in late 2025, GA in early 2026), Cohere, and xAI's Grok — plus local runtimes like Ollama, vLLM, and SGLang. The schema has become the **contract between the model and the rest of your system** , and the advice from teams running this in production is blunt: design the schema first, the same way you'd design a database schema before writing application code. Tools like Pydantic and Zod exist to make that contract executable, and the real prize is _testability_ — once output is typed and schema-valid, you can write unit tests and regression suites against it and catch the day a model update quietly changes its behavior.\n\nGo one layer deeper, to the wire itself, and JSON is there too. The **Model Context Protocol** — introduced by Anthropic in November 2024 and now supported across Claude, Cursor, Gemini, and the major clouds — runs on **JSON-RPC 2.0**. Every tool an agent invokes, every resource it reads, is a JSON-RPC message:\n\n\n\n    {\n      \"jsonrpc\": \"2.0\",\n      \"id\": 7,\n      \"method\": \"tools/call\",\n      \"params\": {\n        \"name\": \"get_order\",\n        \"arguments\": { \"orderId\": \"A-1042\" }\n      }\n    }\n\n\nJSON Schema tells the model what arguments a tool accepts _before_ it calls; one-way notifications carry progress updates; and batching lets an agent fan out several tool calls at once. MCP exists to solve the N×M problem — connecting N models to M tools without writing N×M custom adapters — and it solves it by making JSON the universal language every agent and every tool already speaks.\n\nNow connect the two halves of this article. Every serialization gotcha we covered — the silently dropped field, the `Date` flattened into a string, the circular reference, the event loop frozen by a fat payload — now happens _inside agent pipelines_ , where a non-deterministic model's output becomes your system's input. The silent bug was always dangerous. With a model on one end of the pipe, it's more dangerous than ever. Understanding the refinery stopped being optional the moment your software started talking to itself.\n\n##  The refinery is operational\n\nFrom rigorous lexical validation in the ECMAScript spec, to stream orchestration at scale in Node, to the contract language of autonomous agents — JSON has quietly become the connective tissue of the entire stack. It is one of the simplest formats ever designed, and that simplicity is exactly why it won.\n\nMastering the transformation agents — `replacer`, `reviver`, `structuredClone`, the schema — and the network traffic that carries them is what separates the programmer who _uses_ JSON from the architect who _commands_ it. A technical article, after all, isn't made of words alone; it's made of the small, exact decisions that survive contact with production.\n\nSo the next time an agent calls a tool and an answer comes back clean, you'll know what really happened in that fraction of a second. The Data Refinery is operational — and now you know how to run it.\n\n**Follow me on Dev.to** for practical content about software engineering, AI, architecture, frontend, and backend development.\n\nFor complete articles, developer cheat sheets, and access to CIEL, my AI-powered learning guide, visit: **blense.fun/en**\n\nNo hype. Just clear and practical tech content. 🚀\n\n_Written in June 2026. The platform features referenced — import attributes (`with { type: \"json\" }`), `structuredClone`, native LLM structured output via constrained decoding, and the Model Context Protocol over JSON-RPC 2.0 — reflect the state of JavaScript and the AI tooling ecosystem at that date._",
  "title": "The Data Refinery: How JSON Quietly Became the Language AI Agents Speak"
}