Raw Record Source

{
  "$type": "site.standard.document",
  "bskyPostRef": {
    "cid": "bafyreic3nrvmecqyvft4fm74ykvgjuy4fvdxdlxkdb3pffw3wsxgenbd2a",
    "uri": "at://did:plc:pgryn3ephfd2xgft23qokfzt/app.bsky.feed.post/3mkia23fnhqr2"
  },
  "path": "/t/replacing-claude-code-with-a-local-llm-for-20-devs-has-anyone-actually-pulled-this-off/175590#post_1",
  "publishedAt": "2026-04-27T12:34:51.000Z",
  "site": "https://discuss.huggingface.co",
  "textContent": "for our 20 developers, we’re looking into hardware with 2x RTX PRO 6000, and MiniMax M2.5 llm— can this actually serve our real software engineering team?\n\nWe’re scoping a self-hosted setup at a manufacturing tech company to replace ~€440k/year in Claude Code spend across 20 software engineers. Plan is 1 workstation Threadripper (24 cores) with 2x RTX PRO 6000 Blackwell Max-Q each (192GB VRAM per box), running MiniMax M2.5 INT4 AWQ via vLLM, with lite LLM routing for the hard requests to Claude Opus 4.7 API.\n\nTarget: match Opus 4.6 / GPT-5.3-Codex quality (~80% SWE-bench) on the routine work, fine-tune on our codebase for the Viscon-specific stuff, keep cloud fallback for the genuinely hard problems.\n\nBefore we commit ~€40k all-in on hardware: **has anyone here actually run a local coding stack for 15-20+ concurrent developers in production?** Specifically interested in:\n\n  * Real concurrency numbers on PRO 6000 Blackwell with MiniMax M2.5 (not single-stream benchmarks)\n\n  * Whether developers actually adopted it or quietly went back to cloud\n\n  * KV cache / context length tradeoffs at peak load\n\n  * Routing logic that worked vs fell apart in practice\n\n  * What broke that you didn’t see coming\n\n\n\n\nWar stories welcome — including the ones where it failed. Would rather hear that now than after buying .",
  "title": "Replacing Claude Code with a local LLM for 20 devs — has anyone actually pulled this off?"
}