External Publication
Visit Post

Purpose of commit_hash in PreTrainedModel.from_pretrained

Hugging Face Forums [Unofficial] March 16, 2026
Source

Oh… Complicated…


revision and _commit_hash are related, but they are not the same thing.

  • revision = what the caller asked for. It can be a branch, tag, PR ref, or commit hash. That is the public Hub API. (Hugging Face)
  • _commit_hash = the exact immutable commit that revision resolved to during loading. Transformers tries to discover it early and then carry it through later steps. (GitHub)

Why _commit_hash exists at all

The key background is the Hub cache layout.

Hugging Face stores symbolic refs like main separately from the actual immutable snapshots. In the cache, refs/ holds mappings such as main -> <commit>, while snapshots/<commit>/... holds the actual file tree for that exact revision. So a moving ref like main is not the same thing as a concrete cached snapshot. (Hugging Face)

That is why revision alone is not always enough once loading has started. If Transformers already knows the exact commit, _commit_hash lets it target the precise cached snapshot instead of re-resolving a symbolic ref. (GitHub)

What from_pretrained is doing

Your reading of the code is correct.

from_pretrained first tries to resolve the config file specifically to learn the commit hash “as soon as possible”, then extracts that hash and passes it along as _commit_hash in later calls. (GitHub)

That exact commit is also preserved in config-loading code. If _commit_hash is found in the loaded config, Transformers keeps propagating it instead of discarding it. (GitHub)

So _commit_hash is not a useless internal leftover. It is deliberate state that gets threaded through the loading pipeline. (GitHub)

What _commit_hash is actually used for

Inside cached_file / cached_files, _commit_hash is primarily used for exact cache lookup.

The docstring says it is passed when chaining several file loads and that, if files are already cached for that commit hash, Transformers can “avoid calls to head and get from the cache.” The implementation then checks try_to_load_from_cache(... revision=_commit_hash ...) before doing any remote download. (GitHub)

So the main purpose is:

  1. exact local cache resolution
  2. better offline/cache behavior
  3. provenance propagation across config/tokenizer/model loading steps (GitHub)

Does it pin later remote downloads too?

This is the subtle part.

In the normal single-file cached_file(...) path:

No, not directly.

If the file is not already found via the _commit_hash cache fast path, Transformers falls back to hf_hub_download(... revision=revision ...). In other words, the actual remote call still uses the original revision, not _commit_hash. (GitHub)

So your “it locks all later remote fetches to the same commit” interpretation is too strong for that path.

In the multi-file snapshot_download(...) path:

Yes, effectively.

snapshot_download resolves the requested revision once, gets repo_info.sha as commit_hash, stores the refs/<revision> mapping if needed, and then works from that resolved commit snapshot. (GitHub)

So the “resolve once, then pin to one exact commit” behavior is real, but it is most clearly implemented in snapshot_download, not in every single-file hf_hub_download fallback from cached_file. (GitHub)

Direct answers to your questions

1. If _commit_hash is mainly for local cache resolution, couldn’t revision handle that too?

Not as well.

revision="main" is only a symbolic name. The cache ultimately needs an exact snapshot directory keyed by commit hash. Once Transformers already knows the resolved commit, _commit_hash is the stricter and more useful key. (Hugging Face)

2. Does huggingface_hub use this commit hash to lock the remote fetch?

Not in the general single-filehf_hub_download path used by cached_file. That fallback still passes revision=revision. (GitHub)

Yes insnapshot_download. That code resolves the revision to an exact SHA and then uses that resolved snapshot. (GitHub)

3. Does _commit_hash serve other purposes?

Yes.

Besides exact cache lookup, it also serves as provenance metadata that gets propagated through config loading and later file resolution. That way later steps know not just “the user asked for main”, but “this load was actually resolved from commit X”. (GitHub)

Bottom line

The most accurate summary is:

revision is the user-facing ref. _commit_hash is the exact resolved commit. Transformers uses _commit_hash mainly to make chained loads more deterministic by targeting the exact cached snapshot and carrying forward the resolved provenance. It is not just redundant with revision, but it is also not a universal remote-download lock in every from_pretrained code path. That stronger “pin everything to one commit” behavior is most clearly provided by snapshot_download. (GitHub)

If you want a one-line version:

_commit_hash is mostly an internal exact-snapshot key , while revision is the public ref you asked for.

Discussion in the ATmosphere

Loading comments...