{
"$type": "site.standard.document",
"bskyPostRef": {
"cid": "bafyreibyjnxps5c3iyoo3uli2xzxztlidzq2tuxn65ixqkgbaal3l35mpa",
"uri": "at://did:plc:25rdn5elo5izoxrmtis34zuk/app.bsky.feed.post/3moiu6tushno2"
},
"coverImage": {
"$type": "blob",
"ref": {
"$link": "bafkreiatttc47l6ubom7bckt6ezkgkfxm7mfsvl3gcg5agtzguqdk2u53a"
},
"mimeType": "image/webp",
"size": 72342
},
"path": "/matt_rose_9d0fe88d3533a4f/when-no-answer-beats-a-wrong-answer-designing-precision-first-systems-3hh5",
"publishedAt": "2026-06-17T17:35:53.000Z",
"site": "https://dev.to",
"tags": [
"architecture",
"softwareengineering",
"backend",
"systemdesign"
],
"textContent": "Most systems optimize for getting an answer. Some have to optimize for never getting the wrong one. Here's how building for asymmetric error costs changes everything about your architecture.\n\nA note before we start: this is an architecture essay, not a product teardown. There's no proprietary anything in here — just a design philosophy I've had to live inside for the last couple of years, and that I think more engineers should be deliberate about.\n\n## Two kinds of \"wrong\"\n\nMost of the systems we build are quietly optimized around a comfortable assumption: that a missed answer and a wrong answer cost about the same. A search result you didn't surface and a search result that's slightly off are both just \"not great.\" You tune for accuracy, you ship, you move on.\n\nThen occasionally you build something where that assumption is not just wrong — it's dangerous.\n\nI've spent the last couple of years building a system where the cost of the two errors is wildly asymmetric. Failing to produce an answer is mildly disappointing. Producing the **wrong** answer is unrecoverable — it doesn't just degrade the experience, it damages trust in a way you can't apologize your way out of.\n\nOnce you internalize that asymmetry, almost every default in modern system design starts to look subtly miscalibrated. This is a tour of what changes.\n\n## Accuracy is the wrong headline metric\n\nThe first thing that has to go is \"accuracy.\"\n\nAccuracy blends two very different failures into one number. A model that's 95% accurate might be making its 5% of mistakes by staying quiet — or by confidently asserting falsehoods. Those are not the same system. One is cautious; the other is a liability.\n\nThe metrics that actually matter when errors are asymmetric are **precision** and the shape of your failure distribution:\n\n * **Precision** : of the answers you _did_ commit to, how many were right?\n * **Abstention rate** : how often did you correctly decline to answer?\n * **False commit rate** : how often did you assert something wrong? (This is the one you're really managing. It should be the metric on the wall.)\n\n\n\nRecall — how many answerable cases you actually answered — becomes something you _sacrifice on purpose_. That feels deeply uncomfortable the first time you do it, because we're trained to think coverage is the goal. It isn't. In a precision-first system, coverage is a dial you're allowed to turn _down_ to protect correctness.\n\n## \"I don't know\" is a first-class result\n\nIn most codebases, \"I couldn't determine an answer\" is an afterthought — a `null`, an empty array, a fallthrough `else`. It's treated as the absence of a result rather than a result in its own right.\n\nIn a precision-first system, abstention is a designed, named, fully-supported outcome. It has its own code path, its own logging, its own downstream handling, its own success criteria. \"We looked and chose not to commit\" is a _correct_ behavior, and your system should be able to say it as clearly and confidently as it says anything else.\n\nPractically, that means your core decision function doesn't return `Answer | null`. It returns something closer to:\n\n\n\n Decision =\n | Committed(value, confidence, supporting_evidence)\n | Abstained(reason)\n\n\nBoth branches are first-class. Both are tested. Both are observable in production. The moment \"I don't know\" becomes a real return type instead of a missing value, the rest of the design gets much easier to reason about — because you've stopped pretending every input deserves an output.\n\n## A confidence floor you do not cross\n\nThe heart of the thing is a gate, and the gate has a non-negotiable floor.\n\nBelow a certain confidence threshold, the system does not commit — full stop. Not \"commits with a warning.\" Not \"commits but flags for review later.\" It abstains. The floor is a hard architectural boundary, not a soft suggestion, and it is the same for everyone. No special case, no VIP path, no \"just this once because the demo is tomorrow\" gets to lower it.\n\nThe reason this has to be structural rather than cultural is that confidence floors are exactly the thing that erodes under pressure. Someone will always have a very reasonable-sounding argument for why _this_ case should squeak through. The floor only means something if it's enforced by the system, not by the discipline of whoever is on call that week.\n\n\n\n def decide(signals):\n score = evaluate(signals)\n if score < FLOOR:\n return Abstained(\"below confidence floor\")\n if not corroborated(signals):\n return Abstained(\"insufficient independent support\")\n return Committed(resolve(signals), score, signals)\n\n\nNotice there are _two_ ways to abstain there, which brings us to the second principle.\n\n## One strong signal is not enough\n\nA single source being very, very sure is not the same as being right. Confident-but-wrong is the entire failure mode you're trying to eliminate, and a lone high-confidence signal is precisely how it sneaks in.\n\nSo the gate asks for more than a high score — it asks for **independent corroboration**. Two signals that don't share a failure mode, both pointing the same way, are worth far more than one signal shouting. The key word is _independent_ : two measurements derived from the same underlying source aren't corroboration, they're an echo. Designing for genuine independence — making sure your \"second opinion\" can't fail in the same way as your first — is most of the real work.\n\nThis is an old idea in disguise. It's quorum. It's defense in depth. It's why critical systems use multiple sensors that fail differently. The novelty isn't the pattern; it's the discipline to apply it to _decisions_ , not just to availability.\n\n## Graceful degradation, not graceful guessing\n\nWhen part of the system is unavailable — a dependency is down, a signal is missing, something times out — there's a strong temptation to \"do your best with what you have.\" In a precision-first system, that temptation is the enemy.\n\nDegradation should make the system **more cautious** , not more creative. Fewer signals available means a _higher_ bar to commit, not a lower one, because you have less ability to corroborate. The correct behavior under partial failure is to abstain more often, not to fill in the gaps with optimism. A system that gets _bolder_ as it gets blinder is a system that will eventually hurt someone.\n\n## The hardest part isn't technical\n\nHere's the thing nobody warns you about: the engineering is the easy half. The hard half is that a precision-first system will, by design, do _nothing_ in a large number of cases — and \"did nothing, correctly\" is a genuinely difficult thing for an organization to celebrate.\n\nStakeholders see the abstentions and read them as missed opportunities. There's relentless gravity toward \"can't we just lower the bar a little?\" Every conversation, every dashboard, every incentive nudges toward more coverage. And every one of those nudges is asking you to trade away the exact property that makes the system trustworthy in the first place.\n\nSo part of the architecture lives outside the code. You have to make the asymmetry _legible_ : show people the cost of a false commit in the same frame as the cost of an abstention, so that \"we chose not to answer\" reads as the system working, not the system failing. The floor survives only if everyone understands why it's there.\n\n## The takeaway\n\nIf you're building something where being wrong is worse than being silent — fraud decisions, safety interlocks, anything that touches a real person's trust — consider designing around these explicitly:\n\n 1. **Measure precision and false-commit rate, not accuracy.** Put the scary number on the wall.\n 2. **Make abstention a first-class, designed outcome** — a real return type, not a `null`.\n 3. **Enforce a hard confidence floor in the system,** not in the discipline of your on-call engineer.\n 4. **Require independent corroboration,** and do the hard work of making \"independent\" actually true.\n 5. **Degrade toward caution.** Less information should raise the bar, never lower it.\n 6. **Make the asymmetry visible to the org,** so \"correctly did nothing\" can be recognized as a win.\n\n\n\nRecall is a dial. Trust is a ratchet — it only turns one way, and it turns slowly. Build like the wrong answer is the only one you can't take back, because usually, it is.\n\n_If you've built systems like this, I'd love to compare notes on how you keep the confidence floor from eroding over time — that's the failure mode I find myself defending against most._",
"title": "When No Answer Beats a Wrong Answer: Designing Precision-First Systems"
}