{
"$type": "site.standard.document",
"bskyPostRef": {
"cid": "bafyreic24tig5bwzxpzieypnfzgl7yuck4e4i5abjykug5h36jck7pwufe",
"uri": "at://did:plc:pgryn3ephfd2xgft23qokfzt/app.bsky.feed.post/3mkuw22d35xo2"
},
"path": "/t/inference-requirements/175635#post_2",
"publishedAt": "2026-05-02T14:02:57.000Z",
"site": "https://discuss.huggingface.co",
"textContent": "I don’t think Hugging Face has one central directory exactly like that.\n\nThe closest thing I know of on HF is the model memory estimator / model-memory-usage Space. It can give you a decent first estimate for whether a model will fit in VRAM, but it is not really the same as a curated LLM requirements table.\n\nThe annoying part is that “requirements” are not one number. A 7B model can mean very different things depending on fp16, int8, 4-bit, GGUF, vLLM, Transformers, llama.cpp, context length, batch size, KV cache, CPU offload, etc.\n\nSo for now I usually treat HF model cards as the source for model details, then use a memory estimator or do the rough math myself. For LLMs, fp16/bf16 is roughly 2 GB per billion parameters just for weights, 8-bit around half of that, 4-bit around a quarter, plus overhead for runtime and context.\n\nIt would be nice if HF had this as a first-class field on model pages, even if it was only approximate. Right now it is scattered between model cards, discussions, Spaces, and external tools.",
"title": "Inference Requirements"
}