Raw Record Source

{
  "$type": "site.standard.document",
  "bskyPostRef": {
    "cid": "bafyreigsh7m2bamgi4pxrfyaitdav5v4txnopw5m6pz7pwihmxt6bycopm",
    "uri": "at://did:plc:pgryn3ephfd2xgft23qokfzt/app.bsky.feed.post/3mjpakam7vz32"
  },
  "path": "/t/token-size-if-planning-to-use-llm-while-running-a-game/173433#post_4",
  "publishedAt": "2026-04-17T14:03:10.000Z",
  "site": "https://discuss.huggingface.co",
  "textContent": "I’d dial it back a bit, yeah. Even if your NVIDIA GeForce RTX 4080 can handle a 14–22B model on paper, once you’re running a game at the same time you’re sharing VRAM and things can get unstable or start stuttering.\n\nFrom my experience, something like Mistral 7B is a much safer starting point. It’s lighter, faster, and more than enough for a chatbot running in the background. You can always try scaling up later if you see you still have headroom, but it’s way easier than fighting performance issues mid-game.",
  "title": "Token size if planning to use LLM while running a game?"
}