Raw Record Source

{
  "$type": "site.standard.document",
  "bskyPostRef": {
    "cid": "bafyreif2xwegtmjobvybyv7drmxutmtzaj6yzuvcwsj4jhdiygviqwzzca",
    "uri": "at://did:plc:pgryn3ephfd2xgft23qokfzt/app.bsky.feed.post/3mj2hvzjmob22"
  },
  "path": "/t/is-hf-download-really-auto-resume/175105#post_2",
  "publishedAt": "2026-04-09T05:54:56.000Z",
  "site": "https://discuss.huggingface.co",
  "tags": [
    "GitHub",
    "Hugging Face"
  ],
  "textContent": "In a nutshell, “it’s highly likely that the progress bar’s behavior is simply misleading.”\n\n* * *\n\nYes. It is **most likely resuming** , not restarting from zero.\n\n## Why the first line matters more\n\nThe important signal is the line that says the `.incomplete` file is being resumed from `49645653524/49907246508`. That means Hugging Face already sees **49,645,653,524 bytes** of the target file on disk. That is about **99.48%** of the file, so only about **249.5 MiB** remained. The downloader source also explicitly short-circuits when the already-present size equals the expected size, which confirms that partial-size tracking is a real part of the logic, not just a cosmetic message. (GitHub)\n\n## Why the second line still says `0%`\n\nThat `0%` does **not** necessarily mean “started over.” Two things can make it look that way.\n\nFirst, **rounding** : `9.38M / 49.9G` is only about **0.019%** of the full file, so a progress bar that shows whole percentages will still print `0%`. Second, the current Hugging Face source treats progress differently depending on the download path: the regular HTTP downloader initializes the bar with `initial=resume_size`, but the Xet downloader initializes it with `initial=0`. So a resumed download can still _look_ like it restarted, even when prior bytes were recognized and reused. (GitHub)\n\n## The background behind this\n\nThis is easier to understand once you know how current Hugging Face downloads work. In `huggingface_hub` v1.0, the old `resume_download` parameter was removed, because resume is supposed to happen automatically when possible. Hugging Face also says all repositories on the Hub are Xet-enabled and `hf_xet` is now the default transfer path. Xet is chunk-based rather than just “one plain HTTP stream from byte 0 to byte N,” and Hugging Face keeps both a file cache and a chunk cache locally. (Hugging Face)\n\n## So, is it resumed or not?\n\nMy answer is: **yes, probably resumed**.\n\nMore precisely:\n\n  * the first line is strong evidence that partial data was found and is being used,\n  * the second line is weak evidence because the progress display can under-report resumed state,\n  * and the early `0%` is fully compatible with a resumed session when only a tiny fraction of the total file has been transferred in that visible session so far. (GitHub)\n\n\n\n## The catch\n\n“Resumed” does not mean “guaranteed perfect.”\n\nThere are current bug reports around large-file downloads where resume behavior is unreliable, Xet transfers error out on slow links, or the final file may still need verification. One 2025 bug report shows `hf_xet` failures and low speeds on a slow residential connection. A current 2026 issue argues that large-file downloads still lack robust partial-corruption recovery and chunk-level validation. So your log pattern is most likely benign, but the general area is not completely free of real bugs. (GitHub)\n\n## What I would do\n\n### 1. Let it continue if it is still moving\n\nIf the byte count keeps increasing, I would **not** delete the partial file just because the bar says `0%`. The cache is designed specifically to avoid re-downloading data unnecessarily. (Hugging Face)\n\n### 2. Increase the timeout\n\nThe default `HF_HUB_DOWNLOAD_TIMEOUT` is **10 seconds**. On slow or unstable links, that is easy to trip. Hugging Face explicitly says increasing it helps on slow connections. (Hugging Face)\n\n\n    export HF_HUB_DOWNLOAD_TIMEOUT=60\n    hf download google/gemma-4-26b-a4b-it --token <token>\n\n\n### 3. Verify the result after it finishes\n\nThis is the cleanest way to settle the question “did the resumed download produce a good file?” Hugging Face now documents `hf cache verify`, which checks local files against Hub checksums. (Hugging Face)\n\n\n    hf cache verify google/gemma-4-26b-a4b-it\n\n\nIf you downloaded into a custom folder:\n\n\n    hf cache verify google/gemma-4-26b-a4b-it --local-dir /path/to/download\n\n\n### 4. Inspect the environment if it keeps acting strange\n\n`hf env` is the command Hugging Face recommends for issue reports because it prints the machine setup and relevant downloader configuration. (Hugging Face)\n\n\n    hf env\n\n\n### 5. If the Xet path seems to be the problem, disable it once as a diagnostic\n\nHugging Face documents `HF_HUB_DISABLE_XET=1` to force-disable `hf-xet`. That is a reasonable troubleshooting step if repeated resumed downloads still stall or behave oddly. Also, the source says basic HTTP is blocked only for files **over 50GB** , and your logged file size is just under that threshold, so trying one non-Xet run is feasible here. (Hugging Face)\n\n\n    export HF_HUB_DISABLE_XET=1\n    hf download google/gemma-4-26b-a4b-it --token <token>\n\n\n### 6. If your cache is on an HDD or awkward storage, adjust for that\n\nHugging Face says `hf-xet` is designed for SSD/NVMe-style parallel writes. If you are on a spinning disk, `HF_XET_RECONSTRUCT_WRITE_SEQUENTIALLY=1` can help by switching to sequential writes. (Hugging Face)\n\n\n    export HF_XET_RECONSTRUCT_WRITE_SEQUENTIALLY=1\n\n\n## The clean mental model\n\nUse this rule:\n\n  * **“resume from X/Y”** tells you whether prior bytes were detected.\n  * **The progress bar** tells you how the current session is being visualized.\n  * **`hf cache verify`** tells you whether the finished file is trustworthy. (GitHub)\n\n\n\nSo for your log, the best answer is:\n\n**Yes, it is very likely resumed.**\nThe `0%` line is most likely just a misleading early progress display, helped by rounding and by how the current Xet progress bar is initialized. (GitHub)",
  "title": "Is hf download really auto resume"
}