Raw Record Source

{
  "$type": "site.standard.document",
  "canonicalUrl": "https://justin-stanley.com/posts/snark-pipeline",
  "path": "/posts/snark-pipeline",
  "publishedAt": "2026-06-22T00:00:00.000Z",
  "site": "at://did:plc:ohutz6x5acjmpuulp3x7wxxc/site.standard.publication/3moxid42uzk2k",
  "tags": [
    "llm",
    "homelab",
    "go",
    "postgres"
  ],
  "textContent": "> Difficile est saturam non scribere. — Juvenal\n> (\"It is difficult not to write satire.\")\n\nI keep a social account that's pure relief valve — dry, sarcastic — so I pointed\nsome tooling at myself: can a local model measure my sarcasm over time? Not\nvibes — a number I can argue with.\n\nThis isn't my first sarcasm detector. I've built them since grad school, back when\n\"language model\" meant feature engineering and a lot of praying over a corpus.\nSarcasm was the hard problem — the words betray the meaning, and the old tools\nonly read words. So this wasn't about the snark; it was a test of how much easier\nthat's gotten, and where it hasn't.\n\nThe metric: words vs. meaning\n\nOff-the-shelf sentiment analysis reads the literal words. Point a classic lexicon\nscorer at \"Thanks Biden.\" and it cheerfully calls it positive. Run it across a few\nthousand of my posts and I come out sunny and well-adjusted, which anyone who's\nread them knows is wrong.\n\nThe signal I cared about lives in the gap between what the words say and what I\nmean. So I had the model score both:\n\n- vlit — face value, sarcasm ignored\n- v — intended emotion, sarcasm flipped\n\nThe difference, vlit − v, is the Snark Index: how much sunnier I read than I\nmean. That one number turned out to be the whole project.\n\n{{< figure src=\"/img/snark-index.png\" alt=\"The Snark Index charted over time\" caption=\"The Snark Index over time — how much sunnier the words read than I mean them.\" >}}\n\nThe build\n\nIt all runs on my lab network. The Mac is the lab machine — it has the GPU, so it\nruns the model (a local qwen3 32B; no API bills, nothing leaves the house). The\nNAS is the storage and compute layer — Postgres for the data, Grafana for the\ncharts. The split fell out of that:\n\nThe scorer connects to Postgres across the lab network and writes directly —\nPostgres handles the concurrent access, so there's no relay in between. It's all\nGo, two small binaries — one that pulls new posts and rolls up daily stats, one\nthat scores — plus a Grafana dashboard. Parameterized by handle, runs nightly\nwithout me.\n\n{{< figure src=\"/img/snark-dashboard.png\" alt=\"The Grafana dashboard\" caption=\"The dashboard: Snark Index, intended-vs-literal valence, sarcasm rate, and topic.\" >}}\n\nWhat it found\n\nAcross about 3,700 posts and three years: roughly one in four is sarcastic,\nand the Snark Index lands solidly positive, around +0.3. I read sunnier than\nI mean — my actual intent runs a touch negative (deadpan does that), while the\nwords on their own scan cheerier. The model gives me more credit for warmth than\nmy delivery earns. Make of that what you will; I haven't decided what I make of it.\n\nThe part that's still hard\n\nThe \"much easier\" story has a catch. Getting from detecting sarcasm\nto scoring what I meant took three tries, and the smartest one lost.\n\nTry one flagged the irony correctly — it knew \"thank Big Brother for raising\nthe chocolate ration\" was sarcastic — and then logged it as cheerful anyway.\nRight flag, wrong feeling. I only caught it because the \"positive\" pile was\nsuspiciously full of sarcasm.\n\nTry two asked for two numbers, face value and intent, and that surfaced the\ngenuinely hard case: irony with a target. \"Thanks Biden\" isn't a sentiment, it's a\nformat. Sometimes it's blame. Sometimes it's a sincere thank-you wearing the\nblame-meme as a costume — me actually crediting the tax credit that paid down the\nsolar panels, while mocking the people who say it straight. Same three words,\nopposite meaning, and the only tell is what it's replying to.\n\nTry three was me being clever: I taught the prompt about that nuance, expecting\nsharper results. It got worse. Spelling out \"sometimes sarcasm is sincere\" made\nthe model gun-shy — its sarcasm detection collapsed from a quarter of posts to\nabout seven percent, and it started calling things like \"truly, a stable genius.\nthat was sarcasm.\" positive. It read the words \"that was sarcasm\" and shrugged.\nThe blunt version — flag aggressively, then flip — beat the nuanced one outright.\n\nEasier, not solved\n\nThat's the real finding, and it's a better one than any number on the dashboard.\nThe intelligence got cheap: a decade ago, getting a model to spot irony at all\nwas a thesis; today it's a prompt and a local GPU, and nearly all my effort went\ninto boring solved things — moving data, a schema, a dashboard. But the last mile\ndidn't move. Detecting that something is ironic is easy now. Reading who the\nirony is aimed at — that's still judgment, still context, still the most human\nthing in the pile. And it's humbling that you can make a model dumber by\nexplaining the hard part to it.\n\nSo the dashboard has opinions about me — mostly neutral, occasionally warm,\nreliably grim about the news, and sunnier on the surface than underneath. Whether\nit's right is its own argument, and one I haven't settled. But that it can\nventure a read at all, sarcasm and all, is the thing. A decade ago that would have\nbeen the whole project. Today it's the part that comes cheap — and the hard part\nis exactly where it always was.",
  "title": "I built a snark detector and pointed it at myself"
}