{
"$type": "site.standard.document",
"bskyPostRef": {
"cid": "bafyreifdg5q5a3lquwlnldcumq2iit2ymmh43rotzcspzumnyz7ctptsoq",
"uri": "at://did:plc:25rdn5elo5izoxrmtis34zuk/app.bsky.feed.post/3moz7rn6mr3p2"
},
"coverImage": {
"$type": "blob",
"ref": {
"$link": "bafkreign2vszaus6q4obgkgdaybnzudunikqr7et3rluecxdzx6q7m27cq"
},
"mimeType": "image/webp",
"size": 68834
},
"path": "/riversea/cutting-claude-api-costs-in-half-with-a-3-tier-routing-system-haikusonnetopus-2967",
"publishedAt": "2026-06-24T05:11:01.000Z",
"site": "https://dev.to",
"tags": [
"ai",
"aiagents",
"mcp",
"cloudflare",
"Full post →"
],
"textContent": "Adding more Claude subagents made my pipeline _slower_ past 6 — but the real problem wasn't concurrency at all.\n\nWhen I finally looked at the cost logs for my ad analytics SaaS, every task was hitting Sonnet: renaming files, formatting Slack messages, parsing JSON, and interpreting 12-campaign performance reports. All the same model. Sonnet 4.5 runs $3/M input and $15/M output tokens. Haiku 3.5 is $0.80/$4. Same tokens, 3-4x cost difference based purely on model choice.\n\nI split tasks into three tiers — Haiku for format/parse/extract work with no judgment needed, Sonnet for pattern recognition and multi-step tool use, Opus for architectural decisions (currently one worker out of twelve, run manually). The routing decision itself is made by Haiku classifying the incoming task in ~100 tokens, which costs roughly $0.00008 per call — noise compared to the savings from avoiding a wrong-model assignment.\n\nThe counter-intuitive finding: **task complexity mattered less than context length.** I expected complex tasks to need Sonnet. What I actually found was that Haiku handled surprisingly hard work just fine when context was compressed under 2,000 tokens — and fell apart on simple tasks when context ballooned past 5,000. So context length is now the _first_ branch in my router, not task type.\n\n\n\n const modelMap: Record<Tier, string> = {\n 1: \"claude-haiku-3-5\",\n 2: \"claude-sonnet-4-5\",\n 3: \"claude-opus-4\",\n };\n\n\nAfter six months in production: API spend dropped from $180-200/month to $95-110. Not a clean 50% cut — Haiku retries (about 8% of calls fall back to Sonnet) eat into it. But even counting retry costs, the routing system pays for itself many times over. Trying to get retry rate to 0% by defaulting everything to Sonnet would cost more than tolerating the 8%.\n\nI also hit a D1 `too many variables` error three days after deploy — batching 100 routing log rows at 7 columns each blew past SQLite's 999-variable limit. Dropping batch size to 30 fixed it. Not a routing problem, just a logging assumption that didn't survive contact with reality.\n\nThe full breakdown — including the rule-based pre-filter I'm testing to skip the LLM routing call entirely for 90% of tasks, and the open question of when Opus actually justifies pipeline inclusion — is over on riversealab.\n\nFull post →",
"title": "Cutting Claude API Costs in Half with a 3-Tier Routing System (Haiku/Sonnet/Opus)"
}