Raw Record Source

{
  "$type": "site.standard.document",
  "bskyPostRef": {
    "cid": "bafyreihouwu7vghiimx3zauqsyblcc7pjrphresz5i4al4kzwoffydl5o4",
    "uri": "at://did:plc:pgryn3ephfd2xgft23qokfzt/app.bsky.feed.post/3mh6kh7lst3i2"
  },
  "path": "/t/transformers-js-retrieving-the-size-of-models-in-mb-gb-before-running/174303#post_2",
  "publishedAt": "2026-03-16T09:42:51.000Z",
  "site": "https://discuss.huggingface.co",
  "tags": [
    "Hugging Face",
    "MDN Web Docs",
    "GitHub",
    "@huggingface"
  ],
  "textContent": "Oh… it looks like there isn’t a dedicated API for that yet.\n\n* * *\n\nNo single official Transformers.js API currently tells you, before `pipeline()` starts, “this run needs X MB/GB.” The current documented hook is `progress_callback`, and it is for updates **during model construction**. There is also an open Transformers.js feature request asking for file-size information **before** download starts because the current callbacks do not provide enough information for a single total progress bar across all files. (Hugging Face)\n\n## What the problem really is\n\nBefore a pipeline runs, you need to answer two separate questions:\n\n  1. **How many bytes will this pipeline download and cache?**\n  2. **How much storage headroom does this browser origin have right now?**\n\n\n\nThose are different numbers. The first comes from the model repo and your load options. The second comes from the browser’s Storage API. MDN is explicit that `navigator.storage.estimate()` returns approximate `usage` and `quota` for the current **origin** , not exact raw disk free space. Browser storage is managed per origin, and eviction rules vary by browser. (MDN Web Docs)\n\n## Why one fixed model-size number is usually wrong\n\nTransformers.js does not always fetch one file. The file set depends on options such as:\n\n  * `revision`, which can be a branch, tag, or commit id\n  * `subfolder`, which defaults to `onnx`\n  * `device`\n  * `dtype`\n  * `use_external_data_format`, which the docs say is used for models `>= 2GB` (Hugging Face)\n\n\n\nThe default dtype can also change by backend. Transformers.js documents typical choices such as `fp32`, `fp16`, `q8`, and `q4`, and notes that `fp32` is the default for WebGPU while `q8` is the default for WASM. That means the same repo can require different storage depending on how you load it. (Hugging Face)\n\n## What you should do instead\n\nUse a **preflight step** before `pipeline()`:\n\n  1. Pin the exact load configuration you will use:\n\n     * repo id\n     * `revision`\n     * `subfolder`\n     * `device`\n     * `dtype`\n  2. Fetch the model repo metadata from the Hub with file metadata enabled.\n\n  3. Sum the sizes of the files your configuration will actually need.\n\n  4. Compare that total against `quota - usage` from `navigator.storage.estimate()`.\n\n  5. Add a safety margin because the browser values are estimates. (Hugging Face)\n\n\n\n\nThat is the correct architecture today.\n\n## Where the file sizes come from\n\nThe Hub API already exposes the size metadata you need. The official Hub docs say:\n\n  * `model_info(..., files_metadata=True)` can retrieve metadata for files in the repository, including size and LFS metadata\n  * `RepoSibling.size` is the file size in bytes when file metadata is requested\n  * `RepoFile.size` is the file size in bytes (Hugging Face)\n\n\n\nSo the missing piece is not “file sizes do not exist.” The missing piece is that Transformers.js does not yet wrap that into a built-in “preflight total bytes” API. (GitHub)\n\n## What counts toward required space\n\nFor a browser-first Transformers.js app, the practical first-run footprint can include:\n\n  * model config and tokenizer or processor files\n  * ONNX model files in the configured subfolder\n  * external `.onnx_data` files for very large models when external data format is used\n  * cached ONNX Runtime WASM binaries, because Transformers.js documents `useBrowserCache` as `true` by default if available, and `useWasmCache` as `true` by default when cache is available (Hugging Face)\n\n\n\nSo the question is not just “how big is `model.onnx`?” It is “how big is the full set of artifacts that this load path will cache?” (Hugging Face)\n\n## The formula\n\nA practical estimate is:\n\n\n    required_bytes ≈ sum(selected_repo_files) + safety_buffer\n    available_bytes ≈ quota - usage\n    ok_to_start = available_bytes >= required_bytes\n\n\nUse a buffer such as 10% to 25% because browser storage numbers are approximate and because you may also cache runtime assets such as WASM binaries. (MDN Web Docs)\n\n## Minimal browser-side implementation\n\nThis version uses the Hub metadata endpoint directly. It is simple and works well in a browser app.\n\n\n    function formatBytes(bytes) {\n      const units = [\"B\", \"KB\", \"MB\", \"GB\", \"TB\"];\n      let n = bytes;\n      let i = 0;\n      while (n >= 1024 && i < units.length - 1) {\n        n /= 1024;\n        i++;\n      }\n      return `${n.toFixed(n >= 10 || i === 0 ? 0 : 1)} ${units[i]}`;\n    }\n\n    async function getModelInfoWithSizes(repoId, revision = \"main\") {\n      const url =\n        revision === \"main\"\n          ? `https://huggingface.co/api/models/${repoId}?blobs=true`\n          : `https://huggingface.co/api/models/${repoId}/revision/${encodeURIComponent(revision)}?blobs=true`;\n\n      const res = await fetch(url);\n      if (!res.ok) {\n        throw new Error(`Failed to fetch model metadata: ${res.status} ${res.statusText}`);\n      }\n      return res.json();\n    }\n\n    function getPath(file) {\n      return file.rfilename ?? file.path ?? \"\";\n    }\n\n    function getSize(file) {\n      return file.size ?? file.lfs?.size ?? 0;\n    }\n\n    function pickLikelyTransformersJsFiles(siblings, { subfolder = \"onnx\" } = {}) {\n      const sidecars = new Set([\n        \"config.json\",\n        \"tokenizer.json\",\n        \"tokenizer_config.json\",\n        \"special_tokens_map.json\",\n        \"added_tokens.json\",\n        \"vocab.json\",\n        \"vocab.txt\",\n        \"merges.txt\",\n        \"spiece.model\",\n        \"preprocessor_config.json\",\n        \"processor_config.json\",\n        \"feature_extractor.json\",\n        \"generation_config.json\",\n      ]);\n\n      return siblings.filter((file) => {\n        const path = getPath(file);\n        if (!path) return false;\n        if (sidecars.has(path)) return true;\n        if (path.startsWith(`${subfolder}/`)) return true;\n        return false;\n      });\n    }\n\n    async function estimateOriginStorage() {\n      if (!navigator.storage?.estimate) {\n        return { supported: false, quota: null, usage: null, free: null };\n      }\n      const { quota = 0, usage = 0 } = await navigator.storage.estimate();\n      return {\n        supported: true,\n        quota,\n        usage,\n        free: Math.max(0, quota - usage),\n      };\n    }\n\n    async function estimatePipelineSpace(repoId, {\n      revision = \"main\",\n      subfolder = \"onnx\",\n      safetyFactor = 1.2,\n    } = {}) {\n      const info = await getModelInfoWithSizes(repoId, revision);\n      const siblings = info.siblings ?? [];\n      const files = pickLikelyTransformersJsFiles(siblings, { subfolder });\n\n      const modelBytes = files.reduce((sum, file) => sum + getSize(file), 0);\n      const requiredBytes = Math.ceil(modelBytes * safetyFactor);\n\n      const storage = await estimateOriginStorage();\n\n      return {\n        repoId,\n        revision,\n        files: files.map((f) => ({ path: getPath(f), size: getSize(f) })),\n        modelBytes,\n        modelHuman: formatBytes(modelBytes),\n        requiredBytes,\n        requiredHuman: formatBytes(requiredBytes),\n        storage,\n        enoughSpace:\n          storage.supported && storage.free != null\n            ? storage.free >= requiredBytes\n            : null,\n      };\n    }\n\n\nUse it like this:\n\n\n    const report = await estimatePipelineSpace(\n      \"Xenova/distilbert-base-uncased-finetuned-sst-2-english\",\n      {\n        revision: \"main\",\n        subfolder: \"onnx\",\n      }\n    );\n\n    console.log(\"Estimated model footprint:\", report.modelHuman);\n    console.log(\"Recommended free space:\", report.requiredHuman);\n    console.log(\"Enough space?\", report.enoughSpace);\n    console.table(report.files);\n\n\n## How exact this can be\n\nThere are three levels of accuracy.\n\n### 1. Rough but safe\n\nCount all sidecars plus everything under `onnx/`.\nThis often overestimates, but it is simple and usually safe. The `onnx` default comes from the Transformers.js docs. (Hugging Face)\n\n### 2. Better\n\nPin `revision`, `dtype`, `device`, and `subfolder`, then only count the files that match that exact configuration. This is better because those options directly affect what gets loaded. (Hugging Face)\n\n### 3. Best\n\nMaintain your own manifest of exact filenames and byte totals for each supported model configuration. That is the cleanest production design. The open feature request is effectively asking Transformers.js to expose something like this natively. (GitHub)\n\n## Browser versus Node\n\nIn a browser, the storage question is always about origin quota. In Node, you can reason about the filesystem and cache directory more directly. The `@huggingface/hub` package does provide `scanCacheDir`, `downloadFileToCacheDir`, and `snapshotDownload`, but its README explicitly says those cache helpers do **not** work in the browser. (GitHub)\n\n## Practical conclusion\n\nThe current answer is:\n\n  * **No** , Transformers.js does not currently give you one built-in API that returns required free space before `pipeline()` starts. (GitHub)\n\n  * **Yes** , you can determine it reliably enough by combining:\n\n    * Hub file metadata for required bytes\n    * your chosen Transformers.js load options\n    * `navigator.storage.estimate()` for available origin quota headroom (Hugging Face)\n\n\n\nThe most correct mental model is:\n\n> **required space = exact repo files this load path will cache**\n>  **available space = estimated origin quota headroom** (MDN Web Docs)\n\nA robust implementation pins `revision`, sets `dtype` explicitly, includes `.onnx_data` when applicable, adds a safety margin, and treats the result as a preflight gate before `pipeline()`. (Hugging Face)",
  "title": "Transformers.js: Retrieving the size of models in MB/GB before running"
}