{
"$type": "site.standard.document",
"path": "/devlog/012",
"publishedAt": "2026-06-01T16:48:45Z",
"site": "at://did:plc:mkqt76xvfgxuemlwlx6ruc3w/site.standard.publication/3khuwc44c2256",
"textContent": "# copying the boring parts\n\nthis one starts with a benchmark and ends, happily, with less code to be clever about.\n\nzat's MST implementation had grown up around repo verification. it could load partial trees from commit CARs, invert firehose operations against a post-state tree, and prove the previous root. that machinery mattered. it was introduced in `fff8599` for sync 1.1: missing child blocks became stubs, `putReturn` and `deleteReturn` made operation inversion possible, and `verifyCommitDiff` could reject a commit if the inverted tree did not hash back to `prevData`.\n\nthat is real semantic work. but it does not follow that every hot path should pay for the representation that made it possible.\n\nthe comparison target was [Atmos](https://github.com/jcalabro/atmos), Jim's Go implementation. that is not just another library on the internet; it is written by the person operating almost all production PDS infrastructure. if Atmos is boring in some place, that boringness has earned respect.\n\nthe rule for this pass was simple: match semantics first, benchmark second. if Zat and Atmos do not produce the same root bytes, the benchmark is noise. if they produce the same root bytes while doing different practical work, the benchmark is still muddy.\n\n## what changed\n\nthe old hot shape used a tagged child representation. that made partial trees explicit: a child was either absent, loaded, or a CID stub. Atmos uses something plainer. children are nullable pointers, and an unloaded subtree is a node that carries a CID until `ensureLoaded` mutates it into the loaded node.\n\nZat now follows that shape. `Node.left` and `Entry.right` are nullable `*Node`; unloaded nodes carry cached CIDs; lazy loading mutates the selected node in place. partial-tree behavior still exists, but it is no longer a tag on every child edge.\n\nthe other changes were in the same spirit:\n\n- clean nodes cache their CIDs, so `rootCid` does not re-hash the whole tree\n- MST nodes serialize directly to DAG-CBOR bytes instead of building generic `cbor.Value` maps\n- insertion can borrow caller-owned keys for benchmark/import paths\n- ATProto key comparison uses chunked big-endian reads instead of scalar byte comparison\n- lookup walks entries linearly and follows the ordered tree by key, like Atmos\n\nthat last one took the longest to deserve.\n\n## the tempting mistake\n\nMST node entries are sorted, so binary search looks attractive. it was also wrong for this workload.\n\nthe benchmark tree had:\n\n| shape | value |\n|---|---:|\n| entries | 50,000 |\n| nodes | 13,443 |\n| max entries per node | 32 |\n| average lookup depth | 7.66 nodes |\n\nwith nodes this small, binary search mostly bought unpredictable midpoint branches and worse locality. Atmos scans each node's entries in order. once Zat did the same, lookup stopped looking like a mystery.\n\nthere was one more mismatch. Zat was still using key height and layer arithmetic to decide how lookup should descend. Atmos does not do that. it compares the target key against entries in order, chooses `left` or the previous entry's `right`, and loads that child only if it is the selected path. height still belongs in insertion, deletion, splitting, and normalization. plain lookup does not need it.\n\nso `getWithHeight` remains as an API compatibility surface, but lookup ignores the height. that is a little funny, and also the right kind of boring.\n\n## the numbers\n\nthe publishable benchmark is the official `atproto-bench` MST run: one Zat path, one Atmos path, same corpus, same work, same root.\n\nmedian of three `just bench-mst` runs, each with one warmup pass and five measured passes over 50k deterministic records and 500k lookups per pass:\n\n\n\n| implementation | insert + root | lookup |\n|---|---:|---:|\n| Zat | 5,728,470 records/sec | 6,540,333 lookups/sec |\n| Atmos | 4,147,900 records/sec | 5,512,120 lookups/sec |\n| Zat / Atmos | 1.38x | 1.19x |\n\nboth implementations produced the same root bytes:\n\n`01711220d59a82ffb8968ab6ff46354b382a18072f382240bc447c70e8cbc579221c8c2e`\n\nand the same lookup checksum.\n\nthere were diagnostic benches too, but those stay out of the official table. they were useful for seeing where time went: height calculation, direct CID access, alternate comparators, miss paths, recursive versus iterative lookup. the useful result was not a magic Zig trick. it was deleting unnecessary cleverness until the hot path had the same problem shape as Atmos: pointer children, fast comparison, tiny-node linear scan.\n\n## why insert moved\n\ninsert plus root computation got faster for more direct reasons.\n\nfirst, dirty CID caching stopped recomputing roots for clean subtrees. then direct node serialization removed a pile of generic CBOR construction from the root path. after chunked key comparison, insertion was spending less time re-reading the same long ATProto key prefixes byte by byte.\n\nAtmos is still excellent code. Zat's advantage here is mostly that the current path is arena-friendly, allocation-light, and direct about the bytes it needs to hash.\n\n## the missing middle\n\nthe representation cleanup also exposed an API gap. ZDS was reaching through Zat internals to do two jobs:\n\n- collect MST DAG-CBOR blocks for a commit CAR\n- walk repo records in MST key order\n\nthat was fine while the internal shape was stable, and exactly wrong once the internal shape needed to change.\n\nZat now exposes `Mst.collectBlocks` and `Mst.walk`. `collectBlocks` emits newly materialized MST blocks and skips clean unresolved stubs, matching the old ZDS commit-CAR behavior where existing repo blocks do not need to be re-exported on every write. `walk` resolves lazy nodes when a block reader is available and returns `PartialTree` if the caller asks to walk through an unresolved stub.\n\nZDS adopted both APIs first. record writes now use `collectBlocks`, and repo import uses `walk`, so the new MST layer is exercised by a real PDS rather than only by benchmarks. ZDS was deployed with Zat commit `485f1d485a9b8e7b703e8627a6b6a8c3e3c36a0e` before this release was prepared.\n\n## release shape\n\nthis is `v0.3.5`.\n\nthe public API change is additive: `Mst.collectBlocks` and `Mst.walk`. the rest is internal representation and hot-path behavior. existing callers should not need to change, and the root parity checks are the important proof that the MST semantics did not drift.\n\nthe release checks were:\n\n| check | result |\n|---|---:|\n| Zat tests | 439/439 passed |\n| `zig zen` | ran |\n| ZDS with local Zat fork | passed |\n| atproto-bench with local Zat path | passed |\n| find-bufo with local Zat fork | passed |\n| pollz with local Zat fork | passed |\n| typeahead with local Zat fork | passed |\n\nsome older sibling apps currently fail before they reach this change because of their own dependency graphs: duplicate websocket modules in a few apps, missing local llama/ggml dylibs in Ken, and an older build API in relay-eval. those are release hygiene items for those projects, not MST correctness findings.\n\nthe lesson, if there is one, is not \"Zig beat Go.\" the lesson is more useful than that: when a production implementation is doing something plain, copy the plain thing before inventing a smarter one.\n",
"title": "copying the boring parts"
}