Raw Record Source

{
  "$type": "site.standard.document",
  "bskyPostRef": {
    "cid": "bafyreifdg5q5a3lquwlnldcumq2iit2ymmh43rotzcspzumnyz7ctptsoq",
    "uri": "at://did:plc:25rdn5elo5izoxrmtis34zuk/app.bsky.feed.post/3moz7rn6mr3p2"
  },
  "coverImage": {
    "$type": "blob",
    "ref": {
      "$link": "bafkreign2vszaus6q4obgkgdaybnzudunikqr7et3rluecxdzx6q7m27cq"
    },
    "mimeType": "image/webp",
    "size": 68834
  },
  "path": "/riversea/cutting-claude-api-costs-in-half-with-a-3-tier-routing-system-haikusonnetopus-2967",
  "publishedAt": "2026-06-24T05:11:01.000Z",
  "site": "https://dev.to",
  "tags": [
    "ai",
    "aiagents",
    "mcp",
    "cloudflare",
    "Full post →"
  ],
  "textContent": "Adding more Claude subagents made my pipeline _slower_ past 6 — but the real problem wasn't concurrency at all.\n\nWhen I finally looked at the cost logs for my ad analytics SaaS, every task was hitting Sonnet: renaming files, formatting Slack messages, parsing JSON, and interpreting 12-campaign performance reports. All the same model. Sonnet 4.5 runs $3/M input and $15/M output tokens. Haiku 3.5 is $0.80/$4. Same tokens, 3-4x cost difference based purely on model choice.\n\nI split tasks into three tiers — Haiku for format/parse/extract work with no judgment needed, Sonnet for pattern recognition and multi-step tool use, Opus for architectural decisions (currently one worker out of twelve, run manually). The routing decision itself is made by Haiku classifying the incoming task in ~100 tokens, which costs roughly $0.00008 per call — noise compared to the savings from avoiding a wrong-model assignment.\n\nThe counter-intuitive finding: **task complexity mattered less than context length.** I expected complex tasks to need Sonnet. What I actually found was that Haiku handled surprisingly hard work just fine when context was compressed under 2,000 tokens — and fell apart on simple tasks when context ballooned past 5,000. So context length is now the _first_ branch in my router, not task type.\n\n\n\n    const modelMap: Record<Tier, string> = {\n      1: \"claude-haiku-3-5\",\n      2: \"claude-sonnet-4-5\",\n      3: \"claude-opus-4\",\n    };\n\n\nAfter six months in production: API spend dropped from $180-200/month to $95-110. Not a clean 50% cut — Haiku retries (about 8% of calls fall back to Sonnet) eat into it. But even counting retry costs, the routing system pays for itself many times over. Trying to get retry rate to 0% by defaulting everything to Sonnet would cost more than tolerating the 8%.\n\nI also hit a D1 `too many variables` error three days after deploy — batching 100 routing log rows at 7 columns each blew past SQLite's 999-variable limit. Dropping batch size to 30 fixed it. Not a routing problem, just a logging assumption that didn't survive contact with reality.\n\nThe full breakdown — including the rule-based pre-filter I'm testing to skip the LLM routing call entirely for 90% of tasks, and the open question of when Opus actually justifies pipeline inclusion — is over on riversealab.\n\nFull post →",
  "title": "Cutting Claude API Costs in Half with a 3-Tier Routing System (Haiku/Sonnet/Opus)"
}