External Publication
Visit Post

Ollama model registry provides wrong chat template

Hugging Face Forums [Unofficial] May 21, 2026
Source

or a general HF issue?

This is probably the correct answer.

The most likely underlying cause is that the Ollama templates and the built-in GGUF templates (primarily for Llama.cpp) aren’t necessarily identical.

A similar issue occurred in Qwen 3.5, though the cause was different. This sort of thing happens occasionally when a new model family introduces a lot of changes. If I report it to the HF or Ollama GitHub, it’ll probably get fixed eventually…


Likely cause: HF’s Ollama registry is serving a lossy template, not the quantizer breaking the GGUF

I think you found a real integration-layer bug, or at least a dangerous fallback in the Hugging Face → Ollama compatibility path.

The short answer is:

This does not look primarily like a @bartowski / quantizer configuration error, assuming your GGUF inspection is correct. If the GGUF still contains the full tokenizer.chat_template, then the quantized file likely preserved the important metadata. The suspicious transformation happens later, when Hugging Face exposes the model through the Ollama-compatible registry endpoint.

The failing boundary appears to be:

GGUF metadata:
  tokenizer.chat_template = full / complex / Gemma 4-specific

↓ Hugging Face Ollama compatibility layer

hf.co/v2/<repo>/manifests/<tag>:
  application/vnd.ollama.image.template = short generic Go template

↓ Ollama pull/run via hf.co

ollama show --modelfile hf.co/<repo>:<tag>:
  TEMPLATE = same short generic Go template

That is why the official Ollama model can behave differently: the official Ollama Gemma 4 path uses Ollama’s own Gemma 4 renderer, while the hf.co/v2 path appears to serve a static Ollama TEMPLATE layer.

Relevant references:

  • HF Hub: Use Ollama with any GGUF model
  • Ollama Modelfile Reference
  • HF @huggingface/ollama-utils README
  • Ollama Gemma 4 renderer source
  • llama.cpp chat-template wiki
  • Google Gemma 4 function-calling docs
  • vLLM Gemma 4 usage guide

Why this matters

A chat template is not just formatting. It is the serialization contract between structured chat messages and the raw token sequence the model actually sees.

A chat model does not literally receive this abstract structure:

[
  {"role": "system", "content": "You are helpful."},
  {"role": "user", "content": "Hello"}
]

It receives a rendered prompt string/token stream, for example with special role markers, turn delimiters, BOS/EOS behavior, tool declarations, image placeholders, thinking markers, and stop tokens. If that rendering is wrong, the model can load successfully but behave strangely.

So when Ollama locally sees only this simplified template:

{{ if .System }}<|turn>system
{{ .System }}<turn|>
{{ end }}{{ if .Prompt }}<|turn>user
{{ .Prompt }}<turn|>
{{ end }}<|turn>model
{{ .Response }}<turn|>

or the HF registry blob serves:

{{ if .System }}<bos><|turn>system
{{ .System }}<turn|>
{{ end }}{{ if .Prompt }}<|turn>user
{{ .Prompt }}<turn|>
{{ end }}<|turn>model
{{ .Response }}<turn|>

that is not merely a shorter display version. It may be a materially different prompt format.

For simple one-turn text chat, this can appear to work. For Gemma 4, it is very likely incomplete.


Why I would not primarily blame the quantizer

A quantizer-side problem would be likely if one of these were true:

Observation Likely interpretation
tokenizer.chat_template is missing from the GGUF bad conversion / incomplete GGUF metadata
tokenizer.chat_template inside the GGUF is already simplified converter or quantizer likely damaged metadata
the HF GGUF viewer also shows the simplified template GGUF metadata likely wrong
the repo contains an explicit bad template file repo packaging issue
the problem appears only in one quantizer’s repo repo-specific issue more likely

But your evidence is different:

Layer Your observation What it suggests
GGUF metadata full tokenizer.chat_template still exists quantizer likely preserved the template
HF model details / GGUF view shows the full complex template HF can read the correct metadata
hf.co/v2/.../manifests/IQ2_XXS contains a short application/vnd.ollama.image.template layer registry compatibility layer is suspicious
template blob 159-byte generic Go template conversion/selection/fallback likely lost semantics
local Ollama model sees the simplified template Ollama is consuming the served template
official Ollama Gemma 4 does not show this same problem official path uses different rendering/configuration

That points away from “bad quantization” and toward “bad Ollama image template generated by the registry bridge.”

A quantizer can still add a workaround, but that is different from being the root cause.


The key distinction: GGUF metadata vs Ollama image template

The GGUF can contain one template while the Ollama registry image exposes another.

Your GGUF contains:

tokenizer.chat_template

The Ollama-compatible registry manifest contains:

application/vnd.ollama.image.template

Those are not the same artifact.

HF’s Ollama docs say that, by default, the template for ollama run hf.co/<namespace>/<repo> is selected from commonly used templates based on the GGUF’s built-in tokenizer.chat_template. The same docs say that if a repo provides a custom template file, it must be a Go template , not a Jinja template: HF Ollama docs.

So the intended pipeline is roughly:

GGUF tokenizer.chat_template
  → HF template selection / conversion
  → Ollama Go TEMPLATE
  → application/vnd.ollama.image.template
  → Ollama local Modelfile

Your evidence suggests that the pipeline is losing information in the middle.


Why Jinja → Go template conversion is fragile

HF / Transformers chat templates are generally Jinja-style templates. Ollama TEMPLATE uses Go template syntax.

Ollama’s own docs say TEMPLATE is the full prompt template passed to the model and that templates use Go template syntax: Ollama Modelfile Reference.

That means a bridge has to do one of these:

  1. convert the Jinja template into a Go template;
  2. map the Jinja template to a known built-in Go template;
  3. use a custom model-family-specific handler;
  4. fall back to a simpler template;
  5. or fail.

For simple templates, this may be fine. For Gemma 4, Qwen thinking models, multimodal models, and native tool-calling models, this is fragile.

The public HF package @huggingface/ollama-utils is especially relevant because it says it handles conversion of GGUF/Jinja chat templates into the Go format used by Ollama. It also explicitly lists “the converted template is wrong” as a valid reason to add a custom handler/test.

That is almost exactly your case.


Why Gemma 4 is a bad fit for a tiny generic template

Gemma 4 is not just a plain text chat model with user and assistant turns.

The official Ollama Gemma 4 renderer handles Gemma 4-specific prompt rendering in code: Ollama gemma4.go. The renderer deals with things such as:

  • BOS emission;
  • system/developer messages;
  • <|turn> / <turn|> markers;
  • <|"|> string delimiters;
  • thinking mode;
  • tools;
  • tool declarations;
  • tool calls;
  • tool responses;
  • image tags;
  • adjacent assistant/model turns;
  • stripping thinking blocks;
  • generation-prompt behavior.

Google’s Gemma 4 function-calling docs also show that tools are passed through apply_chat_template() via the tools argument: Google Gemma 4 function calling.

vLLM’s Gemma 4 guide similarly treats Gemma 4 as needing specialized support for reasoning, tool calling, and dynamic multimodal behavior: vLLM Gemma 4 usage guide.

So the official / full behavior surface is closer to:

messages
  + system/developer roles
  + thinking flags
  + tools
  + tool calls
  + tool responses
  + image/audio placeholders
  + special delimiters
  + parser expectations
  → Gemma 4-specific rendered prompt

The short HF-served Ollama template is closer to:

optional system string
+ one user prompt
+ model response marker

Those are not equivalent.


Why the official Ollama model can work while hf.co/... behaves oddly

The official Ollama model and a community GGUF pulled through hf.co are different runtime paths.

Official Ollama Gemma 4 can use:

Ollama model-library metadata
+ Gemma4Renderer
+ known params
+ known stop tokens
+ parser behavior
+ multimodal/projector handling

The HF registry path gives Ollama an Ollama-compatible image manifest with layers like:

application/vnd.ollama.image.model
application/vnd.ollama.image.template
application/vnd.ollama.image.projector
application/vnd.ollama.image.params

If that application/vnd.ollama.image.template layer is a simplified fallback, Ollama may simply use the bad template it was given.

So this difference is expected:

ollama run gemma4:<official-tag>
  → official Ollama packaging / renderer path

ollama run hf.co/bartowski/google_gemma-4-26B-A4B-it-GGUF:IQ2_XXS
  → HF-generated Ollama registry image
  → static application/vnd.ollama.image.template

That does not mean the GGUF is bad. It means the wrapper/rendering path is different.


Most likely root cause

My ranking:

1. Most likely: HF Ollama template conversion/selection fallback

HF reads the GGUF template, tries to convert or classify it, and emits a generic Gemma-ish Go template instead of a faithful Gemma 4 template.

Possible mechanisms:

  • Gemma 4 is not handled by a custom mapping.
  • The Jinja template is too complex or non-linear.
  • The converter only supports a subset of the template.
  • The matcher recognizes the <|turn> markers and chooses a generic template.
  • Tool/thinking/multimodal branches are dropped.
  • A fallback template is emitted instead of failing loudly.

This is the most likely explanation.

2. Also possible: a static Go TEMPLATE cannot fully express official Gemma 4 rendering

Ollama’s official Gemma 4 support is renderer code, not just a template string.

Some behavior may be awkward or impossible to express faithfully in a static Go template, especially if the renderer needs to restructure messages, merge tool results, strip thinking, or parse tool calls.

So there are two different levels of fix:

Level Possible fix
Narrow HF fix Add a better Gemma 4 mapping/custom handler in the HF Ollama compatibility layer
Better Ollama fix Let imported Gemma 4 GGUFs use the same Gemma 4 renderer path as official models
Broader ecosystem fix Support Jinja chat templates directly in Ollama

There is already an Ollama feature request for Jinja chat-template support: ollama/ollama#10222.

3. Possible but less likely: repo-level template override

HF supports a repo-level template file for Ollama, but it must be a Go template: HF Ollama docs.

If the repo contains such a file and it is bad, that could be a repo packaging issue. But from your evidence, the important template is being served as an HF registry layer while the GGUF metadata remains correct.

4. Least likely from your evidence: quantizer damaged the GGUF

This becomes likely only if the GGUF metadata itself is missing, truncated, or simplified.

You said the opposite: the correct tokenizer.chat_template is still there.


How to prove the failing boundary cleanly

Package three artifacts:

  1. the GGUF metadata;
  2. the HF application/vnd.ollama.image.template blob;
  3. the local ollama show --modelfile output.

1. Fetch the HF Ollama manifest

REPO="bartowski/google_gemma-4-26B-A4B-it-GGUF"
TAG="IQ2_XXS"

curl -sSf -L \
  -H "Accept: application/vnd.docker.distribution.manifest.v2+json" \
  "https://hf.co/v2/${REPO}/manifests/${TAG}" \
  | jq . > hf-v2-manifest.json

Highlight this layer:

{
  "mediaType": "application/vnd.ollama.image.template",
  "size": 159
}

2. Fetch the template blob

TEMPLATE_DIGEST="$(
  jq -r '.layers[] | select(.mediaType=="application/vnd.ollama.image.template") | .digest' \
    hf-v2-manifest.json
)"

curl -sSf -L \
  -H "Accept: application/vnd.docker.distribution.manifest.v2+json" \
  "https://hf.co/v2/${REPO}/blobs/${TEMPLATE_DIGEST}" \
  > hf-v2-template.txt

cat hf-v2-template.txt

Expected problematic result:

{{ if .System }}<bos><|turn>system
{{ .System }}<turn|>
{{ end }}{{ if .Prompt }}<|turn>user
{{ .Prompt }}<turn|>
{{ end }}<|turn>model
{{ .Response }}<turn|>

3. Show what Ollama imports

ollama pull "hf.co/${REPO}:${TAG}"

ollama show --modelfile "hf.co/${REPO}:${TAG}" \
  > ollama-show-modelfile.txt

Ollama documents ollama show --modelfile as the way to inspect the model’s Modelfile: Ollama Modelfile Reference.

If ollama-show-modelfile.txt contains the same simplified template, that proves Ollama is using the registry-served template.

4. Inspect GGUF metadata

For example, with llama.cpp tooling:

python ./llama.cpp/gguf-py/scripts/gguf-dump.py \
  --no-tensors \
  ./google_gemma-4-26B-A4B-it-IQ2_XXS.gguf \
  > gguf-metadata.txt

grep -n "tokenizer.chat_template" gguf-metadata.txt

llama.cpp’s template wiki says llama_chat_apply_template() uses the template stored in model metadata key tokenizer.chat_template by default and includes a Jinja parser called minja: llama.cpp template wiki.

5. Summarize the proof

Source Result
GGUF metadata full Gemma 4 tokenizer.chat_template
HF application/vnd.ollama.image.template blob short generic Go template
ollama show --modelfile same short generic Go template

That table makes the issue very clear.


Behavior tests to run

Do not test only hello. A generic template can pass trivial chat while failing important branches.

Test cases that matter:

Test What it checks
one-turn prompt baseline behavior
system prompt system role rendering
multi-turn chat history loop
assistant-history turn assistant/model role rendering
thinking on/off Gemma 4 thinking control
tool declaration tool schema serialization
tool call tool-call formatting
tool response tool-response formatting
image input multimodal placeholder handling
long answer / stop leak stop tokens and turn terminators

The simplified template may only pass the first one or two.


Workarounds

Workaround 1: Use the official Ollama Gemma 4 model

For normal local usage, this is the safest workaround:

ollama pull gemma4:26b
ollama run gemma4:26b

Reason: the official Ollama path can use the dedicated Gemma 4 renderer: Ollama Gemma 4 renderer.

Downside: you may not get the exact community quant you wanted.

Workaround 2: Use llama.cpp / vLLM / another direct GGUF path

For testing the exact GGUF quant, use a runtime path that can apply the embedded template more directly.

Examples of relevant references:

  • HF GGUF with llama.cpp
  • llama.cpp template wiki
  • vLLM Gemma 4 guide

This helps answer:

Is the quantized GGUF itself bad, or is the Ollama wrapper bad?

If the same GGUF behaves better through llama.cpp/vLLM with a correct template, that supports the wrapper/template diagnosis.

Workaround 3: Import the GGUF manually into Ollama

You can bypass the hf.co/v2 registry path:

FROM /absolute/path/to/google_gemma-4-26B-A4B-it-IQ2_XXS.gguf

Then:

ollama create gemma4-local -f Modelfile
ollama show --modelfile gemma4-local

But this is not automatically a full fix. Manual import bypasses the HF registry template blob, but you still need a correct Ollama template or renderer behavior.

Workaround 4: Add a repo-level template file, if a faithful Go template exists

HF allows a repo-level template file for Ollama, but it must be a Go template, not Jinja: HF Ollama docs.

This may help for some models. For Gemma 4, be careful: a partial Go template can fix basic chat while still breaking tools, thinking, images, and parser behavior.

Workaround 5: Render the prompt yourself

For serious application testing, the most controlled workaround is:

  1. use the HF tokenizer/processor;
  2. apply the correct chat template yourself;
  3. send the rendered prompt through a completion-style path;
  4. manage stop tokens and parsing yourself.

This avoids trusting a runtime’s chat serializer.


Where to report

Primary target: Hugging Face

Best first target:

https://github.com/huggingface/huggingface.js/issues

Relevant package:

packages/ollama-utils

Why this target:

  • HF documents that its Ollama path selects a template based on GGUF tokenizer.chat_template: HF Ollama docs.
  • @huggingface/ollama-utils says it converts GGUF/Jinja chat templates to the Go format used by Ollama: README.
  • The same README explicitly lists “the converted template is wrong” as a valid reason for adding a custom handler/test.

Suggested issue title:

hf.co/v2 Ollama registry serves simplified template for Gemma 4 GGUF despite full tokenizer.chat_template in GGUF metadata

Alternative title:

application/vnd.ollama.image.template loses Gemma 4 chat-template semantics

Precise framing:

The GGUF metadata appears correct. The problem appears between GGUF tokenizer.chat_template and the generated Ollama image template layer.

Secondary target: Ollama

Report to Ollama if you can show that imported Gemma 4 GGUFs should use the built-in Gemma 4 renderer but do not, or that the static TEMPLATE mechanism cannot represent official Gemma 4 rendering.

Relevant links:

  • Ollama Gemma 4 renderer
  • Ollama Jinja template support request
  • Ollama Modelfile docs

Optional target: quantizer / repo maintainer

Only report to the quantizer if:

  • the GGUF metadata is actually wrong;
  • the repo has an explicit bad template file;
  • or you want a repo-level workaround.

Suggested wording:

The GGUF metadata appears to contain the full tokenizer.chat_template, but the HF Ollama registry path is serving a simplified application/vnd.ollama.image.template layer. If a faithful Gemma 4 Go template is available, adding a repo-level template file might work around the issue for Ollama users.

That avoids wrongly blaming the quantizer.


Suggested issue body

## Summary

For `bartowski/google_gemma-4-26B-A4B-it-GGUF:IQ2_XXS`, the GGUF metadata appears to contain the full Gemma 4 `tokenizer.chat_template`, but the HF Ollama registry endpoint serves a much shorter `application/vnd.ollama.image.template` layer.

When the model is pulled through Ollama using `hf.co/...`, the local Ollama Modelfile uses this simplified template.

This appears to lose important Gemma 4 chat-template semantics.

## Affected model

- Repo: `bartowski/google_gemma-4-26B-A4B-it-GGUF`
- Tag: `IQ2_XXS`
- Possibly affects other Gemma 4 GGUF repos/tags.

## Steps to reproduce

```sh
REPO="bartowski/google_gemma-4-26B-A4B-it-GGUF"
TAG="IQ2_XXS"

curl -sSf -L \
  -H "Accept: application/vnd.docker.distribution.manifest.v2+json" \
  "https://hf.co/v2/${REPO}/manifests/${TAG}" \
  | jq .

Find the layer:

application/vnd.ollama.image.template

Fetch it:

TEMPLATE_DIGEST="$(
  curl -sSf -L \
    -H "Accept: application/vnd.docker.distribution.manifest.v2+json" \
    "https://hf.co/v2/${REPO}/manifests/${TAG}" \
  | jq -r '.layers[] | select(.mediaType=="application/vnd.ollama.image.template") | .digest'
)"

curl -sSf -L \
  -H "Accept: application/vnd.docker.distribution.manifest.v2+json" \
  "https://hf.co/v2/${REPO}/blobs/${TEMPLATE_DIGEST}"

Actual behavior

The template blob is a short generic Go template:

{{ if .System }}<bos><|turn>system
{{ .System }}<turn|>
{{ end }}{{ if .Prompt }}<|turn>user
{{ .Prompt }}<turn|>
{{ end }}<|turn>model
{{ .Response }}<turn|>

ollama show --modelfile hf.co/${REPO}:${TAG} shows the same simplified template locally.

Expected behavior

The generated Ollama template should either:

  1. preserve the semantics of the GGUF tokenizer.chat_template;
  2. use a Gemma 4-specific custom conversion/handler;
  3. or fail/omit the generated template instead of serving a misleading simplified fallback.

Why this matters

Gemma 4 prompt rendering is not just simple system/user/model turns.

Ollama’s official Gemma 4 renderer handles BOS, system/developer messages, tools, thinking, tool calls, tool responses, image tags, and Gemma 4-specific delimiters:

github.com/ollama/ollama

model/renderers/gemma4.go

main

package renderers

import (
    "fmt"
    "sort"
    "strings"

    "github.com/ollama/ollama/api"
)

// Gemma4Renderer renders prompts using Gemma 4's chat format with
// <|turn>/<turn|> markers, <|"|> string delimiters, and <|tool>/
// <|tool_call>/<|tool_response> tags for function calling.
type Gemma4Renderer struct {
    useImgTags          bool
    emptyBlockOnNothink bool
}

const (
    g4Q = `<|"|>` // Gemma 4 string delimiter

This file has been truncated. show original

A simplified static template can make the model appear to run while silently degrading chat, tool, thinking, multimodal, or parser behavior.

Evidence to attach

  • hf-v2-manifest.json

  • hf-v2-template.txt

  • ollama-show-modelfile.txt

  • GGUF metadata snippet showing the full tokenizer.chat_template


    What maintainers could do

    HF-side fixes:

    • Add a Gemma 4-specific mapping/handler in the Ollama compatibility layer.
    • Add tests comparing converted Go-template output against expected Gemma 4 rendering.
    • Avoid silently serving a simplified fallback when conversion is incomplete.
    • Document cases where a model’s Jinja template cannot be faithfully represented as an Ollama Go TEMPLATE.

    Ollama-side fixes:

    • Support Jinja chat templates directly.
    • Let imported Gemma 4 GGUFs use equivalent Gemma 4 renderer behavior when metadata identifies the architecture.
    • Expose clearer diagnostics when a model falls back to a generic template.

    Quantizer/repo-side mitigations:

    • Preserve tokenizer.chat_template in GGUF metadata.
    • Document known Ollama limitations.
    • Optionally add a repo-level Go template file if a faithful one exists.

    Final answer

    So, answering your question directly:

    Is this a quantizer configuration error?

    Probably not, assuming the GGUF metadata really contains the full template.

    Is it a general HF issue?

    Most likely yes: more specifically, an issue in the HF Ollama registry / template conversion / template selection layer.

    Is Ollama also involved?

    Yes, structurally. Ollama uses Go templates for TEMPLATE, while HF/GGUF templates are generally Jinja-style. Official Gemma 4 in Ollama has a custom renderer, which imported HF GGUFs may not get.

    Why does the official Ollama model behave better?

    Because it can use Ollama’s dedicated Gemma 4 renderer. The HF hf.co/v2 path appears to provide a generated static template layer instead.

    What should you do now?

    1. Use official Ollama Gemma 4 for normal local usage.
    2. Use llama.cpp/vLLM/direct GGUF paths for testing the exact quant.
    3. File a precise issue against HF’s Ollama compatibility layer.
    4. Include the manifest, template blob, ollama show --modelfile, and GGUF metadata.
    5. Do not primarily blame the quantizer unless the GGUF metadata itself is wrong.

    The best one-sentence report is:

    The GGUF metadata contains the full Gemma 4 tokenizer.chat_template, but the HF hf.co/v2 Ollama registry emits a simplified application/vnd.ollama.image.template layer, causing ollama run hf.co/... to use a template that does not preserve Gemma 4 chat-template semantics.

Discussion in the ATmosphere

Loading comments...