External Publication
Visit Post

Purpose of commit_hash in PreTrainedModel.from_pretrained

Hugging Face Forums [Unofficial] March 16, 2026
Source

I’ve been digging into the source code of the transformers library and stumbled upon a detail regarding how files are fetched and cached that I’m hoping someone can clarify.

Specifically, I am trying to understand the exact role of the commit_hash argument within the PreTrainedModel.from_pretrained method, and how it differs from revision.

My initial research led me to believe that commit_hash is used to pin consecutive file downloads to a specific state. This prevents a race condition where a branch (like main) is updated halfway through downloading a multi-file model, which would result in mismatched files.

Looking at the code, it seems to support this. First, it tries to obtain the commit_hash of the current revision early on by resolving the config file:

if commit_hash is None:
    if not isinstance(config, PretrainedConfig):
        # We make a call to the config file first (which may be absent) to get the commit hash as soon as possible
        resolved_config_file = cached_file(
            pretrained_model_name_or_path,
            CONFIG_NAME,
            # ... [other args omitted for brevity] ...
            revision=revision,
        )
        commit_hash = extract_commit_hash(resolved_config_file, commit_hash)
    else:
        commit_hash = getattr(config, "_commit_hash", None)

This commit_hash is then passed down into cached_file_kwargs for subsequent loading code (like fetching the actual model weights):

cached_file_kwargs = {
    # ... [other args] ...
    "revision": revision,
    "_commit_hash": commit_hash,
}
resolved_archive_file = cached_file(pretrained_model_name_or_path, filename, **cached_file_kwargs)

Here is my confusion: When I look inside the cached_file method itself, I noticed that the _commit_hash appears to only be used for local cache checks. If a download from the Hub is actually triggered, it seems to still rely on the revision argument. Other loading functions also don’t seem to strictly use the identified commit_hash for the remote fetch.

My questions:

  1. If commit_hash is primarily used just for local cache resolution, couldn’t the revision argument handle that on its own?

  2. Does the underlying huggingface_hub download logic actually use this commit_hash to lock the remote fetch to that specific commit, or is my assumption about preventing mid-download revision changes incorrect?

  3. Does commit_hashserves any other purposes?

Thank you in advance.

Discussion in the ATmosphere

Loading comments...