{
"$type": "site.standard.document",
"canonicalUrl": "https://rednafi.com/shards/2026/04/dynamo/",
"description": "Key takeaways from Amazon's 2007 Dynamo paper.",
"path": "/shards/2026/04/dynamo/",
"publishedAt": "2026-04-11T00:00:00.000Z",
"site": "at://did:plc:fgtm2c26vfcj74rfmeggbyqj/site.standard.publication/3mnl6f7ob462z",
"tags": [
"Distributed Systems",
"Databases"
],
"textContent": "Finally got around to reading the original [Dynamo paper] from 2007. It's the one that\nkicked off Cassandra, Riak, Voldemort, and a whole generation of eventually consistent\nstores.\n\nAmazon had services like the shopping cart where consistency wasn't worth the availability\ncost. If a node is unreachable in a consistent system, writes block or fail. A stale cart is\nfine; an unavailable cart loses money. So the idea with Dynamo is to never reject a write.\nIf a network partition causes two nodes to accept conflicting writes, both versions survive\nand the application sorts it out on the next read. For the shopping cart that means taking\nthe union of conflicting versions - a deleted item might reappear, but nothing gets lost.\n\nData is partitioned on a [consistent hash] ring with [virtual nodes]. Each key replicates to\nN nodes (typically 3), any of which can accept writes. There are no leaders or followers -\nevery node is identical. [Quorum] parameters R and W are tunable, and (N=3, R=2, W=2) is the\ncommon setup where you get overlap between reads and writes.\n\nIf one of the N replicas is down, [sloppy quorum] lets the write land on the next healthy\nnode on the ring instead. That node holds the data and forwards it via [hinted handoff] once\nthe original recovers. [Merkle trees] handle background [anti-entropy] to sync divergent\nreplicas, and [gossip] handles failure detection with no central membership service\nanywhere.\n\n[Vector clocks] track causal ordering so the system knows when two writes genuinely conflict\nrather than one superseding the other. When that happens, Dynamo keeps both versions and\nhands them back on the next read - the application has to reconcile them. They also had to\ntruncate old clock entries to keep the metadata bounded, which can lose ordering info and\ncreate even more conflicts for the app to sort out.\n\nTable 1 from the paper summarizes the architecture:\n\n| Problem | Technique | Advantage |\n| ---------------------------------- | -------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------- |\n| Partitioning | [Consistent hash]ing | Incremental scalability |\n| High availability for writes | [Vector clocks] with reconciliation during reads | Version size is decoupled from update rates |\n| Handling temporary failures | [Sloppy quorum] and [hinted handoff] | Provides high availability and durability guarantee when some of the replicas are not available |\n| Recovering from permanent failures | [Anti-entropy] using [Merkle trees] | Synchronizes divergent replicas in the background |\n| Membership and failure detection | [Gossip]-based membership protocol and failure detection | Preserves symmetry and avoids having a centralized registry for storing membership and node liveness information |\n\nAWS's own DynamoDB service eventually dropped vector clocks and moved to leader-based\nreplication with Multi-Paxos per partition. The [2022 DynamoDB paper] covers the shift.\nTurns out, pushing conflict resolution onto application developers didn't scale as a product\ndecision even if it scaled as an architecture.\n\n\n\n\n[Dynamo paper]:\n https://www.allthingsdistributed.com/files/amazon-dynamo-sosp2007.pdf\n\n[consistent hash]:\n https://arpitbhayani.me/blogs/consistent-hashing/\n\n[virtual nodes]:\n https://arpitbhayani.me/blogs/consistent-hashing/\n\n[quorum]:\n https://web.mit.edu/6.033/2005/wwwdocs/quorum_note.html\n\n[sloppy quorum]:\n https://distributed-computing-musings.com/2022/05/sloppy-quorum-and-hinted-handoff-quorum-in-the-times-of-failure/\n\n[hinted handoff]:\n https://systemdesign.one/hinted-handoff/\n\n[vector clocks]:\n https://sookocheff.com/post/time/vector-clocks/\n\n[Merkle trees]:\n https://brilliant.org/wiki/merkle-tree/\n\n[anti-entropy]:\n https://www.influxdata.com/blog/eventual-consistency-anti-entropy/\n\n[gossip]:\n https://highscalability.com/gossip-protocol-explained/\n\n[2022 DynamoDB paper]:\n https://www.usenix.org/conference/atc22/presentation/elhemali",
"title": "Dynamo: Amazon's highly available key-value store"
}