{
"$type": "site.standard.document",
"bskyPostRef": {
"cid": "bafyreie6y2ce6csienbl5mbjvookpujtajnxrnhbx3qoktdxl4rncm7e24",
"uri": "at://did:plc:pgryn3ephfd2xgft23qokfzt/app.bsky.feed.post/3mh5pkwpvl6a2"
},
"path": "/t/purpose-of-commit-hash-in-pretrainedmodel-from-pretrained/174304#post_1",
"publishedAt": "2026-03-16T05:09:11.000Z",
"site": "https://discuss.huggingface.co",
"textContent": "I’ve been digging into the source code of the `transformers` library and stumbled upon a detail regarding how files are fetched and cached that I’m hoping someone can clarify.\n\nSpecifically, I am trying to understand the exact role of the `commit_hash` argument within the `PreTrainedModel.from_pretrained` method, and how it differs from `revision`.\n\nMy initial research led me to believe that `commit_hash` is used to pin consecutive file downloads to a specific state. This prevents a race condition where a branch (like `main`) is updated halfway through downloading a multi-file model, which would result in mismatched files.\n\nLooking at the code, it seems to support this. First, it tries to obtain the `commit_hash` of the current revision early on by resolving the config file:\n\n\n if commit_hash is None:\n if not isinstance(config, PretrainedConfig):\n # We make a call to the config file first (which may be absent) to get the commit hash as soon as possible\n resolved_config_file = cached_file(\n pretrained_model_name_or_path,\n CONFIG_NAME,\n # ... [other args omitted for brevity] ...\n revision=revision,\n )\n commit_hash = extract_commit_hash(resolved_config_file, commit_hash)\n else:\n commit_hash = getattr(config, \"_commit_hash\", None)\n\n\n\nThis `commit_hash` is then passed down into `cached_file_kwargs` for subsequent loading code (like fetching the actual model weights):\n\n\n cached_file_kwargs = {\n # ... [other args] ...\n \"revision\": revision,\n \"_commit_hash\": commit_hash,\n }\n resolved_archive_file = cached_file(pretrained_model_name_or_path, filename, **cached_file_kwargs)\n\n\n\n**Here is my confusion:** When I look inside the `cached_file` method itself, I noticed that the `_commit_hash` appears to only be used for _local cache checks_. If a download from the Hub is actually triggered, it seems to still rely on the `revision` argument. Other loading functions also don’t seem to strictly use the identified `commit_hash` for the remote fetch.\n\n**My questions:**\n\n 1. If `commit_hash` is primarily used just for local cache resolution, couldn’t the `revision` argument handle that on its own?\n\n 2. Does the underlying `huggingface_hub` download logic actually use this `commit_hash` to lock the remote fetch to that specific commit, or is my assumption about preventing mid-download revision changes incorrect?\n\n 3. Does `commit_hash`serves any other purposes?\n\n\n\n\nThank you in advance.",
"title": "Purpose of commit_hash in PreTrainedModel.from_pretrained"
}