Token size if planning to use LLM while running a game?
Hugging Face Forums [Unofficial]
April 17, 2026
I’d dial it back a bit, yeah. Even if your NVIDIA GeForce RTX 4080 can handle a 14–22B model on paper, once you’re running a game at the same time you’re sharing VRAM and things can get unstable or start stuttering.
From my experience, something like Mistral 7B is a much safer starting point. It’s lighter, faster, and more than enough for a chatbot running in the background. You can always try scaling up later if you see you still have headroom, but it’s way easier than fighting performance issues mid-game.
Discussion in the ATmosphere