External Publication
Visit Post

Replacing Claude Code with a local LLM for 20 devs — has anyone actually pulled this off?

Hugging Face Forums [Unofficial] April 27, 2026
Source

for our 20 developers, we’re looking into hardware with 2x RTX PRO 6000, and MiniMax M2.5 llm— can this actually serve our real software engineering team?

We’re scoping a self-hosted setup at a manufacturing tech company to replace ~€440k/year in Claude Code spend across 20 software engineers. Plan is 1 workstation Threadripper (24 cores) with 2x RTX PRO 6000 Blackwell Max-Q each (192GB VRAM per box), running MiniMax M2.5 INT4 AWQ via vLLM, with lite LLM routing for the hard requests to Claude Opus 4.7 API.

Target: match Opus 4.6 / GPT-5.3-Codex quality (~80% SWE-bench) on the routine work, fine-tune on our codebase for the Viscon-specific stuff, keep cloud fallback for the genuinely hard problems.

Before we commit ~€40k all-in on hardware: has anyone here actually run a local coding stack for 15-20+ concurrent developers in production? Specifically interested in:

  • Real concurrency numbers on PRO 6000 Blackwell with MiniMax M2.5 (not single-stream benchmarks)

  • Whether developers actually adopted it or quietly went back to cloud

  • KV cache / context length tradeoffs at peak load

  • Routing logic that worked vs fell apart in practice

  • What broke that you didn’t see coming

War stories welcome — including the ones where it failed. Would rather hear that now than after buying .

Discussion in the ATmosphere

Loading comments...