Raw Record Source

{
  "$type": "site.standard.document",
  "bskyPostRef": {
    "cid": "bafyreiem4nidwu4kxpjtkkfyg3nmhplj267yzaqnxqzsegj3b3eotfebpy",
    "uri": "at://did:plc:25rdn5elo5izoxrmtis34zuk/app.bsky.feed.post/3mp7ptdy6lhc2"
  },
  "coverImage": {
    "$type": "blob",
    "ref": {
      "$link": "bafkreif3ditzc7ait3qh2xz6ye7aiwgecxyqcjuizlly3u4ltluzqklvne"
    },
    "mimeType": "image/webp",
    "size": 72244
  },
  "path": "/arjunkshah/supercompress-cut-llm-costs-by-65-without-losing-answers-2c8n",
  "publishedAt": "2026-06-26T19:23:33.000Z",
  "site": "https://dev.to",
  "tags": [
    "ai",
    "llm",
    "opensource",
    "showdev",
    "GitHub",
    "Live Demo",
    "Interactive Tool"
  ],
  "textContent": "##  Tweet 1\n\nEvery LLM call burns GPU cycles on tokens that never needed to run.\n\nPadding. Boilerplate. Irrelevant context.\n\nI built SuperCompress — a tiny CPU policy that cuts 65% of tokens before inference.\n\nOpen source. MIT. Free tier.\n\nsupercompress.vercel.app\n\n##  Tweet 2\n\nThe problem is worse than most people realize.\n\nAt ~50M agent turns/day:\n\n→ 100B tokens wasted daily\n\n→ 24K GPU hours\n\n→ 1,526 tons CO₂\n\n→ 6.5M L cooling water\n\nWe're burning through resources on tokens that don't matter.\n\n##  Tweet 3\n\nHow it works:\n\n1️⃣ Context + question → CPU policy (5K params)\n\n2️⃣ Every line scored for relevance to the question\n\n3️⃣ Low-scoring lines evicted\n\n4️⃣ Only essential tokens reach the GPU\n\nCPU first. GPU for what matters.\n\n##  Tweet 4\n\nThe numbers at 35% budget:\n\n• 65% KV cache saved\n\n• 100% oracle recall (vs 25% for truncation)\n\n• ~60ms CPU latency\n\nSame answers. ⅓ the compute.\n\n##  Tweet 5\n\nPer 1 million compressions:\n\n→ 800M tokens avoided\n\n→ 29 kWh saved\n\n→ 12 kg CO₂ avoided\n\n→ 52 L cooling water saved\n\nScale that across the industry and it's enormous.\n\n##  Tweet 6\n\nSuperCompress is:\n\n✅ Open source (MIT)\n\n✅ Free API tier\n\n✅ Python library\n\n✅ Browser demo (no install)\n\n✅ Integration guides for OpenAI/LangChain\n\nTry it: supercompress.vercel.app\n\nGitHub: github.com/arjunkshah/supercompress\n\n##  Tweet 7\n\nBuilt this because I believe we can't scale AI by burning through what we have left.\n\nSmarter compute means more AI for everyone — without the environmental cost.\n\nWould love feedback from the community 🙏\n\n#  LLM #AI #OpenSource #MachineLearning\n\n**Links:** GitHub | Live Demo | Interactive Tool",
  "title": "SuperCompress: Cut LLM Costs by 65% Without Losing Answers"
}