External Publication

Token size if planning to use LLM while running a game?

Hugging Face Forums [Unofficial] April 17, 2026

I’d dial it back a bit, yeah. Even if your NVIDIA GeForce RTX 4080 can handle a 14–22B model on paper, once you’re running a game at the same time you’re sharing VRAM and things can get unstable or start stuttering. From my experience, something like Mistral 7B is a much safer starting point. It’s lighter, faster, and more than enough for a chatbot running in the background. You can always try scaling up later if you see you still have headroom, but it’s way easier than fighting performance issues mid-game.

Discussion in the ATmosphere