External Publication

Inference Requirements

Hugging Face Forums [Unofficial] May 2, 2026

I don’t think Hugging Face has one central directory exactly like that. The closest thing I know of on HF is the model memory estimator / model-memory-usage Space. It can give you a decent first estimate for whether a model will fit in VRAM, but it is not really the same as a curated LLM requirements table. The annoying part is that “requirements” are not one number. A 7B model can mean very different things depending on fp16, int8, 4-bit, GGUF, vLLM, Transformers, llama.cpp, context length, batch size, KV cache, CPU offload, etc. So for now I usually treat HF model cards as the source for model details, then use a memory estimator or do the rough math myself. For LLMs, fp16/bf16 is roughly 2 GB per billion parameters just for weights, 8-bit around half of that, 4-bit around a quarter, plus overhead for runtime and context. It would be nice if HF had this as a first-class field on model pages, even if it was only approximate. Right now it is scattered between model cards, discussions, Spaces, and external tools.

Discussion in the ATmosphere