{
  "$type": "site.standard.document",
  "bskyPostRef": {
    "cid": "bafyreic24tig5bwzxpzieypnfzgl7yuck4e4i5abjykug5h36jck7pwufe",
    "uri": "at://did:plc:pgryn3ephfd2xgft23qokfzt/app.bsky.feed.post/3mkuw22d35xo2"
  },
  "path": "/t/inference-requirements/175635#post_2",
  "publishedAt": "2026-05-02T14:02:57.000Z",
  "site": "https://discuss.huggingface.co",
  "textContent": "I don’t think Hugging Face has one central directory exactly like that.\n\nThe closest thing I know of on HF is the model memory estimator / model-memory-usage Space. It can give you a decent first estimate for whether a model will fit in VRAM, but it is not really the same as a curated LLM requirements table.\n\nThe annoying part is that “requirements” are not one number. A 7B model can mean very different things depending on fp16, int8, 4-bit, GGUF, vLLM, Transformers, llama.cpp, context length, batch size, KV cache, CPU offload, etc.\n\nSo for now I usually treat HF model cards as the source for model details, then use a memory estimator or do the rough math myself. For LLMs, fp16/bf16 is roughly 2 GB per billion parameters just for weights, 8-bit around half of that, 4-bit around a quarter, plus overhead for runtime and context.\n\nIt would be nice if HF had this as a first-class field on model pages, even if it was only approximate. Right now it is scattered between model cards, discussions, Spaces, and external tools.",
  "title": "Inference Requirements"
}