Ollama model registry provides wrong chat template
I use ollama for running models locally. I noticed that many models behave odd.
Today I went down the rabbit hole to find out why, using bartowski/google_gemma-4-26B-A4B-it-GGUF · Hugging Face as example.
On the HF website, if you open the model details, it will show the correct (complex and lengthy) chat template. Locally I only get a dumbed down version of it:
{{ if .System }}<|turn>system
{{ .System }}<turn|>
{{ end }}{{ if .Prompt }}<|turn>user
{{ .Prompt }}<turn|>
{{ end }}<|turn>model
{{ .Response }}<turn|>
Turns out, this is what HF serves via the ollama model registry:
$ curl -sSf -L -H "Accept: application/vnd.docker.distribution.manifest.v2+json" https://hf.co/v2/bartowski/google_gemma-4-26B-A4B-it-GGUF/manifests/IQ2_XXS | jq
{
"schemaVersion": 2,
"mediaType": "application/vnd.docker.distribution.manifest.v2+json",
"config": {
"digest": "sha256:61b27106ee697324d453c4fdcc4be2e002f1cea930191141d20db1726150ab59",
"mediaType": "application/vnd.docker.container.image.v1+json",
"size": 629
},
"layers": [
{
"digest": "sha256:d516a0bca35cbb83081074bbf58ec2877911111192fbc2c353bf81cd0667b452",
"mediaType": "application/vnd.ollama.image.model",
"size": 9656494368
},
{
"digest": "sha256:f56e8459650d8354cf701fa5b0ddaea9a7986a271d7f55677152d1355ab5afb6",
"mediaType": "application/vnd.ollama.image.template",
"size": 159
},
{
"digest": "sha256:41cdabd1e8066e983ee6c288eb0117777376223ee0279cadcd67b2295e4d975f",
"mediaType": "application/vnd.ollama.image.projector",
"size": 1193058528
},
{
"digest": "sha256:f5107f3ab6b0815958755af9391fb4149e62d2cd3535f3a4ecbd3c3938d47d3e",
"mediaType": "application/vnd.ollama.image.params",
"size": 52
}
]
}
$ curl -sSf -L -H "Accept: application/vnd.docker.distribution.manifest.v2+json" https://hf.co/v2/bartowski/google_gemma-4-26B-A4B-it-GGUF/blobs/sha256:f56e8459650d8354cf701fa5b0ddaea9a7986a271d7f55677152d1355ab5afb6
{{ if .System }}<bos><|turn>system
{{ .System }}<turn|>
{{ end }}{{ if .Prompt }}<|turn>user
{{ .Prompt }}<turn|>
{{ end }}<|turn>model
{{ .Response }}<turn|>
When I look into the gguf myself, the correct tokenizer.chat_template is still there.
This happens for multiple large quantizers, so the question is: Is this a configuration error made by the quantizers, e.g. @bartowski, or a general HF issue? The “official” version hosted by Ollama themselves does not seem to have this problem.
This is my first time here, please be gentle. I did research on this topic and didn’t find an answer.
Discussion in the ATmosphere