{
"$type": "site.standard.document",
"bskyPostRef": {
"cid": "bafyreibqddorphb6bjblpoxqjpc33y6iqeu2ogus2riec3gmafhru7tzw4",
"uri": "at://did:plc:oojf6mo4xi5eyf5yypmkwi5j/app.bsky.feed.post/3mnw62c2myst2"
},
"description": "An automated tracker booked the pre-discount figure as revenue. Not garbage data, plausible data, which is why it survived. On why judgement, not speed, is the real leverage.",
"path": "/blog/the-ai-handed-me-a-number-wrong-in-the-most-convincing-way/",
"publishedAt": "2026-06-10T07:15:00.000Z",
"site": "https://www.livain.com",
"tags": [
"verification in large language models",
"poor data quality costs organisations an average of $12.9 million a year",
"handing it two years of receipts",
"dashboard that lies by omission",
"deep domain expertise being the real leverage",
"Data quality: why it matters and how to achieve it",
"Chain-of-Verification reduces hallucination in large language models",
"I let AI clear two years of receipts. The leverage wasn't speed.",
"Your analytics dashboard is lying to you by leaving things out",
"The death of generic AI: why deep domain expertise is the only real leverage left"
],
"textContent": "My bookkeeper caught something in a revenue overview I'd shared with her. A couple of invoices were showing more income than I'd actually billed. Not a rounding wobble — the figure was inflated by a real amount, and it had been sitting in my numbers looking perfectly legitimate.\n\nThe cause was small and almost elegant in how wrong it was. My revenue tracker pulls invoice data automatically from my time-tracking tool. When an invoice has a discount, that tool exposes several money fields: the subtotal before the discount, the discount amount, the tax, and the final total. My script had grabbed the subtotal — the figure _before_ the rebate — and recorded it as revenue. So every discounted invoice quietly reported what I _could_ have charged rather than what I actually did.\n\n### It was wrong in the most convincing way possible\n\nThis is the part worth sitting with. The number wasn't garbage. It was a real value, pulled from the right invoice, formatted correctly, landing in the right column. It was off by exactly the discount — the one slice of the total a glance would never question. If it had been wildly wrong, I'd have spotted it. Because it was plausibly wrong, it survived.\n\nThat's the failure mode of automated systems and AI tools generally, and there's a growing body of research on it. The work on verification in large language models describes exactly this: when these systems are wrong, they don't produce obvious nonsense, they produce a \"plausible-looking alternative\" that's hard to catch precisely because the rest of the output is fine. A single wrong figure hides comfortably inside a page of correct ones.\n\n> The dangerous error isn't the one that looks wrong. It's the one that looks exactly right.\n\nAnd it's not a cheap problem at scale. Gartner's much-cited estimate is that poor data quality costs organisations an average of $12.9 million a year. Most of that isn't dramatic corruption. It's thousands of small, confident, slightly-off numbers feeding decisions nobody thought to re-check.\n\n### Two things made the difference, and neither was the software\n\nThe first was a human who knew what revenue means. To my bookkeeper, \"you can only book what you actually invoiced\" isn't a clever insight — it's the floor. She didn't need to see the code. She read the output against twenty years of knowing how the number is supposed to behave, and the discounted rows simply looked wrong to her. That's domain expertise doing the one thing automation can't: holding the result up against what reality requires.\n\nThe second was fixing the _system_ , not just the cell. I corrected the two affected rows by hand, but the real repair was teaching the script to compute revenue as subtotal minus discount, and writing the rule — \"record only what was actually invoiced\" — into the tool's instructions so it can't drift back. A spreadsheet you patch is a chore. A rule you encode is a fix.\n\n### The leverage is in the checking, not the speed\n\nI'm all in on letting AI and automation do the mechanical work — I've written about handing it two years of receipts, and the leverage there was never the speed. It was that I knew what the output was supposed to look like, so I could catch it when it drifted. This invoice bug is the same story from the other side: the automation will confidently hand you a number, and someone has to own whether that number is true.\n\nIt's the quieter cousin of the dashboard that lies by omission. There, the report leaves something out. Here, the report includes something it shouldn't. Both look complete. Both pass the glance test. Both need a person who knows the territory to say, \"that's not right,\" and to know _why_ — which is the whole argument for deep domain expertise being the real leverage rather than the tool.\n\nThe tool will give you an answer in milliseconds. Whether it's the right answer is still, stubbornly, your job.\n\n### Sources & further reading\n\n**External**\nGartner — Data quality: why it matters and how to achieve it\narXiv — Chain-of-Verification reduces hallucination in large language models\n\n**Related posts**\nI let AI clear two years of receipts. The leverage wasn't speed.\nYour analytics dashboard is lying to you by leaving things out\nThe death of generic AI: why deep domain expertise is the only real leverage left",
"title": "The AI handed me a number that was wrong in the most convincing way",
"updatedAt": "2026-06-10T07:15:00.632Z"
}