{
"$type": "site.standard.document",
"bskyPostRef": {
"cid": "bafyreiem4nidwu4kxpjtkkfyg3nmhplj267yzaqnxqzsegj3b3eotfebpy",
"uri": "at://did:plc:25rdn5elo5izoxrmtis34zuk/app.bsky.feed.post/3mp7ptdy6lhc2"
},
"coverImage": {
"$type": "blob",
"ref": {
"$link": "bafkreif3ditzc7ait3qh2xz6ye7aiwgecxyqcjuizlly3u4ltluzqklvne"
},
"mimeType": "image/webp",
"size": 72244
},
"path": "/arjunkshah/supercompress-cut-llm-costs-by-65-without-losing-answers-2c8n",
"publishedAt": "2026-06-26T19:23:33.000Z",
"site": "https://dev.to",
"tags": [
"ai",
"llm",
"opensource",
"showdev",
"GitHub",
"Live Demo",
"Interactive Tool"
],
"textContent": "## Tweet 1\n\nEvery LLM call burns GPU cycles on tokens that never needed to run.\n\nPadding. Boilerplate. Irrelevant context.\n\nI built SuperCompress — a tiny CPU policy that cuts 65% of tokens before inference.\n\nOpen source. MIT. Free tier.\n\nsupercompress.vercel.app\n\n## Tweet 2\n\nThe problem is worse than most people realize.\n\nAt ~50M agent turns/day:\n\n→ 100B tokens wasted daily\n\n→ 24K GPU hours\n\n→ 1,526 tons CO₂\n\n→ 6.5M L cooling water\n\nWe're burning through resources on tokens that don't matter.\n\n## Tweet 3\n\nHow it works:\n\n1️⃣ Context + question → CPU policy (5K params)\n\n2️⃣ Every line scored for relevance to the question\n\n3️⃣ Low-scoring lines evicted\n\n4️⃣ Only essential tokens reach the GPU\n\nCPU first. GPU for what matters.\n\n## Tweet 4\n\nThe numbers at 35% budget:\n\n• 65% KV cache saved\n\n• 100% oracle recall (vs 25% for truncation)\n\n• ~60ms CPU latency\n\nSame answers. ⅓ the compute.\n\n## Tweet 5\n\nPer 1 million compressions:\n\n→ 800M tokens avoided\n\n→ 29 kWh saved\n\n→ 12 kg CO₂ avoided\n\n→ 52 L cooling water saved\n\nScale that across the industry and it's enormous.\n\n## Tweet 6\n\nSuperCompress is:\n\n✅ Open source (MIT)\n\n✅ Free API tier\n\n✅ Python library\n\n✅ Browser demo (no install)\n\n✅ Integration guides for OpenAI/LangChain\n\nTry it: supercompress.vercel.app\n\nGitHub: github.com/arjunkshah/supercompress\n\n## Tweet 7\n\nBuilt this because I believe we can't scale AI by burning through what we have left.\n\nSmarter compute means more AI for everyone — without the environmental cost.\n\nWould love feedback from the community 🙏\n\n# LLM #AI #OpenSource #MachineLearning\n\n**Links:** GitHub | Live Demo | Interactive Tool",
"title": "SuperCompress: Cut LLM Costs by 65% Without Losing Answers"
}