Justin-Stanley.com

I built a snark detector and pointed it at myself

Justin Stanley June 22, 2026

> Difficile est saturam non scribere. — Juvenal > ("It is difficult not to write satire.") I keep a social account that's pure relief valve — dry, sarcastic — so I pointed some tooling at myself: can a local model measure my sarcasm over time? Not vibes — a number I can argue with. This isn't my first sarcasm detector. I've built them since grad school, back when "language model" meant feature engineering and a lot of praying over a corpus. Sarcasm was the hard problem — the words betray the meaning, and the old tools only read words. So this wasn't about the snark; it was a test of how much easier that's gotten, and where it hasn't. The metric: words vs. meaning Off-the-shelf sentiment analysis reads the literal words. Point a classic lexicon scorer at "Thanks Biden." and it cheerfully calls it positive. Run it across a few thousand of my posts and I come out sunny and well-adjusted, which anyone who's read them knows is wrong. The signal I cared about lives in the gap between what the words say and what I mean. So I had the model score both: - vlit — face value, sarcasm ignored - v — intended emotion, sarcasm flipped The difference, vlit − v, is the Snark Index: how much sunnier I read than I mean. That one number turned out to be the whole project. {{< figure src="/img/snark-index.png" alt="The Snark Index charted over time" caption="The Snark Index over time — how much sunnier the words read than I mean them." >}} The build It all runs on my lab network. The Mac is the lab machine — it has the GPU, so it runs the model (a local qwen3 32B; no API bills, nothing leaves the house). The NAS is the storage and compute layer — Postgres for the data, Grafana for the charts. The split fell out of that: The scorer connects to Postgres across the lab network and writes directly — Postgres handles the concurrent access, so there's no relay in between. It's all Go, two small binaries — one that pulls new posts and rolls up daily stats, one that scores — plus a Grafana dashboard. Parameterized by handle, runs nightly without me. {{< figure src="/img/snark-dashboard.png" alt="The Grafana dashboard" caption="The dashboard: Snark Index, intended-vs-literal valence, sarcasm rate, and topic." >}} What it found Across about 3,700 posts and three years: roughly one in four is sarcastic, and the Snark Index lands solidly positive, around +0.3. I read sunnier than I mean — my actual intent runs a touch negative (deadpan does that), while the words on their own scan cheerier. The model gives me more credit for warmth than my delivery earns. Make of that what you will; I haven't decided what I make of it. The part that's still hard The "much easier" story has a catch. Getting from detecting sarcasm to scoring what I meant took three tries, and the smartest one lost. Try one flagged the irony correctly — it knew "thank Big Brother for raising the chocolate ration" was sarcastic — and then logged it as cheerful anyway. Right flag, wrong feeling. I only caught it because the "positive" pile was suspiciously full of sarcasm. Try two asked for two numbers, face value and intent, and that surfaced the genuinely hard case: irony with a target. "Thanks Biden" isn't a sentiment, it's a format. Sometimes it's blame. Sometimes it's a sincere thank-you wearing the blame-meme as a costume — me actually crediting the tax credit that paid down the solar panels, while mocking the people who say it straight. Same three words, opposite meaning, and the only tell is what it's replying to. Try three was me being clever: I taught the prompt about that nuance, expecting sharper results. It got worse. Spelling out "sometimes sarcasm is sincere" made the model gun-shy — its sarcasm detection collapsed from a quarter of posts to about seven percent, and it started calling things like "truly, a stable genius. that was sarcasm." positive. It read the words "that was sarcasm" and shrugged. The blunt version — flag aggressively, then flip — beat the nuanced one outright. Easier, not solved That's the real finding, and it's a better one than any number on the dashboard. The intelligence got cheap: a decade ago, getting a model to spot irony at all was a thesis; today it's a prompt and a local GPU, and nearly all my effort went into boring solved things — moving data, a schema, a dashboard. But the last mile didn't move. Detecting that something is ironic is easy now. Reading who the irony is aimed at — that's still judgment, still context, still the most human thing in the pile. And it's humbling that you can make a model dumber by explaining the hard part to it. So the dashboard has opinions about me — mostly neutral, occasionally warm, reliably grim about the news, and sunnier on the surface than underneath. Whether it's right is its own argument, and one I haven't settled. But that it can venture a read at all, sarcasm and all, is the thing. A decade ago that would have been the whole project. Today it's the part that comes cheap — and the hard part is exactly where it always was.

Discussion in the ATmosphere