{
  "$type": "site.standard.document",
  "bskyPostRef": {
    "cid": "bafyreifgcwgbintur5ootvuaiwrmravqi7n3bmezeoesypne7ed5n5dbn4",
    "uri": "at://did:plc:pgryn3ephfd2xgft23qokfzt/app.bsky.feed.post/3mlvwwj2vwnp2"
  },
  "path": "/t/need-english-only-or-minimal-multilingual-2b-4b-llm-for-agentic-ai-on-gtx-1660-super-6gb-vram-quantization-friendly/176044#post_1",
  "publishedAt": "2026-05-15T17:46:07.000Z",
  "site": "https://discuss.huggingface.co",
  "textContent": "I’m building an Agentic AI application with very limited hardware: **GTX 1660 Super (Turing, 6GB VRAM)**. I plan to run a single LLM per agent (not multiple models simultaneously) to stay within VRAM limits.\n\n**What I’ve tried so far:**\n\n  * `llama-3.2-3b-instruct` (4-bit) → poor results\n\n  * `SmolLM3-3B` (no quantization) → good results but saturates 6GB VRAM, nothing left for computation\n\n  * `SmolLM3-3B` (4-bit) → better than Llama, but still not good enough for my needs\n\n  * Planning to test `Qwen3-4B-Thinking` and `Phi-3-mini-128k-instruct` next\n\n\n\n\n**My problem:** All these models are multilingual. That’s overkill for my use case. I suspect those extra language capabilities waste parameter capacity and VRAM that could otherwise improve English performance or reduce model size.\n\n**My request:** Can you recommend a **2B–4B parameter LLM that is English-only (or max 2–3 languages)** and works well with 4-bit or 8-bit quantization on 6GB VRAM? I’m looking for something that prioritizes English instruction-following, reasoning, and agentic tasks (tool use, planning, memory) over multilingual coverage.\n\n**Bonus points if:**\n\n  * The model is known to be quantization-friendly (GPTQ, AWQ, or llama.cpp compatible)\n\n  * There are quantized versions available on HF already\n\n  * It has good benchmark scores (MMLU, GSM8K) compared to SmolLM3 or Llama-3.2-3B\n\n\n\n\n**What I don’t need:**\n\n  * Translation capabilities\n\n  * Support for non-Latin scripts\n\n  * Massive vocabulary covering rare Unicode characters\n\n\n\n\nThank you!",
  "title": "Need English-only (or minimal multilingual) 2B-4B LLM for Agentic AI on GTX 1660 Super (6GB VRAM) – quantization friendly"
}