Raw Record Source

{
  "$type": "site.standard.document",
  "bskyPostRef": {
    "cid": "bafyreig4j25z4gbx6ejbqjunzsqcpqtuisgwmomvjx2g2sunducsx7w4x4",
    "uri": "at://did:plc:pgryn3ephfd2xgft23qokfzt/app.bsky.feed.post/3mgwj33egki52"
  },
  "path": "/t/automodel-with-clinicalbert-gives-unexpected-warning/174220#post_2",
  "publishedAt": "2026-03-13T03:21:20.000Z",
  "site": "https://discuss.huggingface.co",
  "tags": [
    "Hugging Face"
  ],
  "textContent": "I was able to reproduce the warning here too. Since Transformers upgraded from v4 to v5, the `UNEXPECTED` display has become more visually noticeable, so we might be surprised.\n\nSimilar to the warning in v4, there’s no real harm in most use cases. It’s mainly a warning for clarification.\n\n* * *\n\n## What that warning means in your case\n\n`AutoModel.from_pretrained(\"emilyalsentzer/Bio_ClinicalBERT\")` uses the model’s config to choose the **base architecture class** automatically. For a BERT-family checkpoint, `AutoModel` resolves to **`BertModel`**. Hugging Face’s Auto Classes docs show exactly this pattern: `AutoModel.from_pretrained(\"...bert...\")` creates a `BertModel`. (Hugging Face)\n\nYour checkpoint, however, appears to contain not only the base BERT encoder weights, but also **pretraining-head weights**. The model page is tagged **Fill-Mask** , and the model card says this model was initialized from BioBERT and pretrained on MIMIC notes, while also showing `AutoModel.from_pretrained(...)` as a valid usage example. That combination is consistent with a checkpoint that can be used as a plain encoder but still carries extra task-specific weights from pretraining. (Hugging Face)\n\nSo the warning is saying:\n\n  * **the base encoder loaded** , and\n  * **some extra checkpoint weights were present but not needed by`BertModel`**. (Hugging Face)\n\n\n\n## What “when loading from different task/architecture” means\n\nHere, “different task/architecture” does **not** mean “completely different neural network family.” It usually means:\n\n  * same backbone family: **BERT**\n  * but a **different model class for a different task**\n\n\n\nFor BERT, Hugging Face distinguishes classes such as:\n\n  * **`BertModel`** : backbone only\n  * **`BertForMaskedLM`** : backbone + masked-language-model head\n  * **`BertForPreTraining`** : backbone + masked-language-model head + next-sentence-prediction head. (Hugging Face)\n\n\n\nHugging Face’s long-standing explanation for this warning is explicit: it **is expected** if you initialize one class from a checkpoint trained for another task or architecture, and **is not expected** only when you believe the source and target classes should be exactly identical. ([GitHub](https://github.com/huggingface/transformers/issues/5421 \"What to do about this warning message: \"Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification\" · Issue #5421 · huggingface/transformers · GitHub\"))\n\nThat is the phrase you were asking about.\n\n## Why those exact `cls.*` keys show up\n\nThe names in your warning point to BERT’s pretraining heads:\n\n  * `cls.predictions.*` corresponds to the **masked language modeling** head\n  * `cls.seq_relationship.*` corresponds to the **next sentence prediction** head. (Hugging Face)\n\n\n\nThe BERT docs describe `BertForPreTraining` as a BERT model with **two heads on top** : a masked language modeling head and a next sentence prediction head. They also note that BERT’s `pooler_output` is trained from the next sentence prediction objective during pretraining. (Hugging Face)\n\nSo in plain English:\n\n  * the checkpoint contains weights for BERT’s original pretraining tasks,\n  * but `AutoModel` asked for only the backbone encoder,\n  * therefore those head weights are reported as **unexpected** and ignored. (Hugging Face)\n\n\n\n## Do you need to change your system or environment?\n\n**No.** Nothing in this warning suggests a problem with:\n\n  * Python 3.13.6\n  * your venv\n  * macOS\n  * Apple Silicon / M1. (Hugging Face)\n\n\n\nThis is a **model-class / checkpoint-content** issue, not a platform issue. The warning itself is the same kind of warning Hugging Face documents for “loading from another task,” and your keys match that pattern closely. ([GitHub](https://github.com/huggingface/transformers/issues/5421 \"What to do about this warning message: \"Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification\" · Issue #5421 · huggingface/transformers · GitHub\"))\n\nSo the fix is **not** “reinstall Python” or “change your machine.”\n\n## What you should change, if anything\n\nThat depends on what you actually want from the model.\n\n### If you want embeddings / hidden states / encoder outputs\n\nKeep using:\n\n\n    model = AutoModel.from_pretrained(\"emilyalsentzer/Bio_ClinicalBERT\", token=HF_TOKEN)\n\n\nThat is appropriate for using the model as a BERT encoder, and the model card itself shows `AutoModel.from_pretrained(...)` as a valid way to use this checkpoint. In that case, the warning is usually benign. (Hugging Face)\n\n### If you want masked-token prediction\n\nUse:\n\n\n    from transformers import AutoModelForMaskedLM\n    model = AutoModelForMaskedLM.from_pretrained(\"emilyalsentzer/Bio_ClinicalBERT\", token=HF_TOKEN)\n\n\n`BertForMaskedLM` is the BERT class with a language-modeling head on top, which matches fill-mask usage better than plain `BertModel`. (Hugging Face)\n\n### If you want the closest match to the full original BERT pretraining checkpoint\n\nUse:\n\n\n    from transformers import AutoModelForPreTraining\n    model = AutoModelForPreTraining.from_pretrained(\"emilyalsentzer/Bio_ClinicalBERT\", token=HF_TOKEN)\n\n\n`BertForPreTraining` is documented as having both the MLM head and the NSP head, which is the closest conceptual match to the `cls.predictions.*` and `cls.seq_relationship.*` weights in your warning. (Hugging Face)\n\n## The practical answer for your case\n\nBecause you said:\n\n  * the script runs, and\n  * it produces sensible output,\n\n\n\nthe most likely explanation is that the **base encoder loaded correctly** , and only the extra pretraining heads were skipped. That is exactly the sort of case Hugging Face describes as expected when loading from another task/class combination. ([GitHub](https://github.com/huggingface/transformers/issues/5421 \"What to do about this warning message: \"Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification\" · Issue #5421 · huggingface/transformers · GitHub\"))\n\nSo the most direct answer is:\n\n  * **No environment changes are needed**\n  * **Your warning is probably harmless for encoder use**\n  * **Only change the model class if your actual task is MLM or full pretraining-head behavior**. (Hugging Face)\n\n\n\n## One useful diagnostic check\n\nHugging Face documents `output_loading_info=True`, which returns missing keys, unexpected keys, and error messages from `from_pretrained()`. That can help confirm that the only unexpected keys are the `cls.*` ones you already saw. (Hugging Face)\n\n\n    model, info = AutoModel.from_pretrained(\n        \"emilyalsentzer/Bio_ClinicalBERT\",\n        token=HF_TOKEN,\n        output_loading_info=True,\n    )\n\n    print(info[\"unexpected_keys\"])\n    print(info[\"missing_keys\"])\n    print(info[\"error_msgs\"])\n\n\nIf the unexpected keys are just the `cls.*` pretraining-head keys, that strongly supports the benign interpretation above. (Hugging Face)\n\n## Final conclusion\n\nYour warning does **not** indicate a broken environment.\n\nIt means:\n\n  * `AutoModel` gave you **`BertModel`**\n  * the checkpoint also includes **pretraining-head weights**\n  * those extra weights are reported as **`UNEXPECTED`**\n  * and this is normal when the checkpoint and instantiated class are for different BERT task variants. (Hugging Face)\n\n\n\nFor a graph-database pipeline that uses ClinicalBERT as an encoder, I would usually leave the environment alone and keep `AutoModel` unless you specifically need masked-token logits or full pretraining-head behavior. (Hugging Face)",
  "title": "AutoModel with ClinicalBERT gives UNEXPECTED warning"
}