Raw Record Source

{
  "$type": "site.standard.document",
  "bskyPostRef": {
    "cid": "bafyreicip7aarmdt23mzo4jb2rduatl3b47ujiv6euhang3nvgrqpvclfq",
    "uri": "at://did:plc:pgryn3ephfd2xgft23qokfzt/app.bsky.feed.post/3mme4izshmfn2"
  },
  "path": "/t/ollama-model-registry-provides-wrong-chat-template/176139#post_3",
  "publishedAt": "2026-05-21T07:57:44.000Z",
  "site": "https://discuss.huggingface.co",
  "tags": [
    "A similar issue occurred in Qwen 3.5, though the cause was different",
    "HF Hub: Use Ollama with any GGUF model",
    "Ollama Modelfile Reference",
    "HF @huggingface/ollama-utils README",
    "Ollama Gemma 4 renderer source",
    "llama.cpp chat-template wiki",
    "Google Gemma 4 function-calling docs",
    "vLLM Gemma 4 usage guide",
    "HF Ollama docs",
    "@huggingface/ollama-utils",
    "Ollama gemma4.go",
    "Google Gemma 4 function calling",
    "ollama/ollama#10222",
    "llama.cpp template wiki",
    "Ollama Gemma 4 renderer",
    "HF GGUF with llama.cpp",
    "vLLM Gemma 4 guide",
    "https://github.com/huggingface/huggingface.js/issues",
    "README",
    "Ollama Jinja template support request",
    "Ollama Modelfile docs",
    "github.com/ollama/ollama",
    "model/renderers/gemma4.go",
    "main",
    "show original",
    "@bartowski",
    "@huggingface"
  ],
  "textContent": "> or a general HF issue?\n\nThis is probably the correct answer.\n\nThe most likely underlying cause is that the Ollama templates and the built-in GGUF templates (primarily for Llama.cpp) aren’t necessarily identical.\n\nA similar issue occurred in Qwen 3.5, though the cause was different. This sort of thing happens occasionally when a new model family introduces a lot of changes. If I report it to the HF or Ollama GitHub, it’ll probably get fixed eventually…\n\n* * *\n\n# Likely cause: HF’s Ollama registry is serving a lossy template, not the quantizer breaking the GGUF\n\nI think you found a real integration-layer bug, or at least a dangerous fallback in the Hugging Face → Ollama compatibility path.\n\nThe short answer is:\n\n**This does not look primarily like a @bartowski / quantizer configuration error, assuming your GGUF inspection is correct.** If the GGUF still contains the full `tokenizer.chat_template`, then the quantized file likely preserved the important metadata. The suspicious transformation happens later, when Hugging Face exposes the model through the Ollama-compatible registry endpoint.\n\nThe failing boundary appears to be:\n\n\n    GGUF metadata:\n      tokenizer.chat_template = full / complex / Gemma 4-specific\n\n    ↓ Hugging Face Ollama compatibility layer\n\n    hf.co/v2/<repo>/manifests/<tag>:\n      application/vnd.ollama.image.template = short generic Go template\n\n    ↓ Ollama pull/run via hf.co\n\n    ollama show --modelfile hf.co/<repo>:<tag>:\n      TEMPLATE = same short generic Go template\n\n\nThat is why the official Ollama model can behave differently: the official Ollama Gemma 4 path uses Ollama’s own Gemma 4 renderer, while the `hf.co/v2` path appears to serve a static Ollama `TEMPLATE` layer.\n\nRelevant references:\n\n  * HF Hub: Use Ollama with any GGUF model\n  * Ollama Modelfile Reference\n  * HF @huggingface/ollama-utils README\n  * Ollama Gemma 4 renderer source\n  * llama.cpp chat-template wiki\n  * Google Gemma 4 function-calling docs\n  * vLLM Gemma 4 usage guide\n\n\n\n* * *\n\n## Why this matters\n\nA chat template is not just formatting. It is the serialization contract between structured chat messages and the raw token sequence the model actually sees.\n\nA chat model does not literally receive this abstract structure:\n\n\n    [\n      {\"role\": \"system\", \"content\": \"You are helpful.\"},\n      {\"role\": \"user\", \"content\": \"Hello\"}\n    ]\n\n\nIt receives a rendered prompt string/token stream, for example with special role markers, turn delimiters, BOS/EOS behavior, tool declarations, image placeholders, thinking markers, and stop tokens. If that rendering is wrong, the model can load successfully but behave strangely.\n\nSo when Ollama locally sees only this simplified template:\n\n\n    {{ if .System }}<|turn>system\n    {{ .System }}<turn|>\n    {{ end }}{{ if .Prompt }}<|turn>user\n    {{ .Prompt }}<turn|>\n    {{ end }}<|turn>model\n    {{ .Response }}<turn|>\n\n\nor the HF registry blob serves:\n\n\n    {{ if .System }}<bos><|turn>system\n    {{ .System }}<turn|>\n    {{ end }}{{ if .Prompt }}<|turn>user\n    {{ .Prompt }}<turn|>\n    {{ end }}<|turn>model\n    {{ .Response }}<turn|>\n\n\nthat is not merely a shorter display version. It may be a materially different prompt format.\n\nFor simple one-turn text chat, this can appear to work. For Gemma 4, it is very likely incomplete.\n\n* * *\n\n## Why I would not primarily blame the quantizer\n\nA quantizer-side problem would be likely if one of these were true:\n\nObservation | Likely interpretation\n---|---\n`tokenizer.chat_template` is missing from the GGUF | bad conversion / incomplete GGUF metadata\n`tokenizer.chat_template` inside the GGUF is already simplified | converter or quantizer likely damaged metadata\nthe HF GGUF viewer also shows the simplified template | GGUF metadata likely wrong\nthe repo contains an explicit bad `template` file | repo packaging issue\nthe problem appears only in one quantizer’s repo | repo-specific issue more likely\n\nBut your evidence is different:\n\nLayer | Your observation | What it suggests\n---|---|---\nGGUF metadata | full `tokenizer.chat_template` still exists | quantizer likely preserved the template\nHF model details / GGUF view | shows the full complex template | HF can read the correct metadata\n`hf.co/v2/.../manifests/IQ2_XXS` | contains a short `application/vnd.ollama.image.template` layer | registry compatibility layer is suspicious\ntemplate blob | 159-byte generic Go template | conversion/selection/fallback likely lost semantics\nlocal Ollama model | sees the simplified template | Ollama is consuming the served template\nofficial Ollama Gemma 4 | does not show this same problem | official path uses different rendering/configuration\n\nThat points away from “bad quantization” and toward “bad Ollama image template generated by the registry bridge.”\n\nA quantizer can still add a workaround, but that is different from being the root cause.\n\n* * *\n\n## The key distinction: GGUF metadata vs Ollama image template\n\nThe GGUF can contain one template while the Ollama registry image exposes another.\n\nYour GGUF contains:\n\n\n    tokenizer.chat_template\n\n\nThe Ollama-compatible registry manifest contains:\n\n\n    application/vnd.ollama.image.template\n\n\nThose are not the same artifact.\n\nHF’s Ollama docs say that, by default, the template for `ollama run hf.co/<namespace>/<repo>` is selected from commonly used templates based on the GGUF’s built-in `tokenizer.chat_template`. The same docs say that if a repo provides a custom `template` file, it must be a **Go template** , not a Jinja template: HF Ollama docs.\n\nSo the intended pipeline is roughly:\n\n\n    GGUF tokenizer.chat_template\n      → HF template selection / conversion\n      → Ollama Go TEMPLATE\n      → application/vnd.ollama.image.template\n      → Ollama local Modelfile\n\n\nYour evidence suggests that the pipeline is losing information in the middle.\n\n* * *\n\n## Why Jinja → Go template conversion is fragile\n\nHF / Transformers chat templates are generally Jinja-style templates. Ollama `TEMPLATE` uses Go template syntax.\n\nOllama’s own docs say `TEMPLATE` is the full prompt template passed to the model and that templates use Go template syntax: Ollama Modelfile Reference.\n\nThat means a bridge has to do one of these:\n\n  1. convert the Jinja template into a Go template;\n  2. map the Jinja template to a known built-in Go template;\n  3. use a custom model-family-specific handler;\n  4. fall back to a simpler template;\n  5. or fail.\n\n\n\nFor simple templates, this may be fine. For Gemma 4, Qwen thinking models, multimodal models, and native tool-calling models, this is fragile.\n\nThe public HF package @huggingface/ollama-utils is especially relevant because it says it handles conversion of GGUF/Jinja chat templates into the Go format used by Ollama. It also explicitly lists **“the converted template is wrong”** as a valid reason to add a custom handler/test.\n\nThat is almost exactly your case.\n\n* * *\n\n## Why Gemma 4 is a bad fit for a tiny generic template\n\nGemma 4 is not just a plain text chat model with `user` and `assistant` turns.\n\nThe official Ollama Gemma 4 renderer handles Gemma 4-specific prompt rendering in code: Ollama gemma4.go. The renderer deals with things such as:\n\n  * BOS emission;\n  * system/developer messages;\n  * `<|turn>` / `<turn|>` markers;\n  * `<|\"|>` string delimiters;\n  * thinking mode;\n  * tools;\n  * tool declarations;\n  * tool calls;\n  * tool responses;\n  * image tags;\n  * adjacent assistant/model turns;\n  * stripping thinking blocks;\n  * generation-prompt behavior.\n\n\n\nGoogle’s Gemma 4 function-calling docs also show that tools are passed through `apply_chat_template()` via the `tools` argument: Google Gemma 4 function calling.\n\nvLLM’s Gemma 4 guide similarly treats Gemma 4 as needing specialized support for reasoning, tool calling, and dynamic multimodal behavior: vLLM Gemma 4 usage guide.\n\nSo the official / full behavior surface is closer to:\n\n\n    messages\n      + system/developer roles\n      + thinking flags\n      + tools\n      + tool calls\n      + tool responses\n      + image/audio placeholders\n      + special delimiters\n      + parser expectations\n      → Gemma 4-specific rendered prompt\n\n\nThe short HF-served Ollama template is closer to:\n\n\n    optional system string\n    + one user prompt\n    + model response marker\n\n\nThose are not equivalent.\n\n* * *\n\n## Why the official Ollama model can work while `hf.co/...` behaves oddly\n\nThe official Ollama model and a community GGUF pulled through `hf.co` are different runtime paths.\n\nOfficial Ollama Gemma 4 can use:\n\n\n    Ollama model-library metadata\n    + Gemma4Renderer\n    + known params\n    + known stop tokens\n    + parser behavior\n    + multimodal/projector handling\n\n\nThe HF registry path gives Ollama an Ollama-compatible image manifest with layers like:\n\n\n    application/vnd.ollama.image.model\n    application/vnd.ollama.image.template\n    application/vnd.ollama.image.projector\n    application/vnd.ollama.image.params\n\n\nIf that `application/vnd.ollama.image.template` layer is a simplified fallback, Ollama may simply use the bad template it was given.\n\nSo this difference is expected:\n\n\n    ollama run gemma4:<official-tag>\n      → official Ollama packaging / renderer path\n\n    ollama run hf.co/bartowski/google_gemma-4-26B-A4B-it-GGUF:IQ2_XXS\n      → HF-generated Ollama registry image\n      → static application/vnd.ollama.image.template\n\n\nThat does not mean the GGUF is bad. It means the wrapper/rendering path is different.\n\n* * *\n\n## Most likely root cause\n\nMy ranking:\n\n### 1. Most likely: HF Ollama template conversion/selection fallback\n\nHF reads the GGUF template, tries to convert or classify it, and emits a generic Gemma-ish Go template instead of a faithful Gemma 4 template.\n\nPossible mechanisms:\n\n  * Gemma 4 is not handled by a custom mapping.\n  * The Jinja template is too complex or non-linear.\n  * The converter only supports a subset of the template.\n  * The matcher recognizes the `<|turn>` markers and chooses a generic template.\n  * Tool/thinking/multimodal branches are dropped.\n  * A fallback template is emitted instead of failing loudly.\n\n\n\nThis is the most likely explanation.\n\n### 2. Also possible: a static Go `TEMPLATE` cannot fully express official Gemma 4 rendering\n\nOllama’s official Gemma 4 support is renderer code, not just a template string.\n\nSome behavior may be awkward or impossible to express faithfully in a static Go template, especially if the renderer needs to restructure messages, merge tool results, strip thinking, or parse tool calls.\n\nSo there are two different levels of fix:\n\nLevel | Possible fix\n---|---\nNarrow HF fix | Add a better Gemma 4 mapping/custom handler in the HF Ollama compatibility layer\nBetter Ollama fix | Let imported Gemma 4 GGUFs use the same Gemma 4 renderer path as official models\nBroader ecosystem fix | Support Jinja chat templates directly in Ollama\n\nThere is already an Ollama feature request for Jinja chat-template support: ollama/ollama#10222.\n\n### 3. Possible but less likely: repo-level `template` override\n\nHF supports a repo-level `template` file for Ollama, but it must be a Go template: HF Ollama docs.\n\nIf the repo contains such a file and it is bad, that could be a repo packaging issue. But from your evidence, the important template is being served as an HF registry layer while the GGUF metadata remains correct.\n\n### 4. Least likely from your evidence: quantizer damaged the GGUF\n\nThis becomes likely only if the GGUF metadata itself is missing, truncated, or simplified.\n\nYou said the opposite: the correct `tokenizer.chat_template` is still there.\n\n* * *\n\n## How to prove the failing boundary cleanly\n\nPackage three artifacts:\n\n  1. the GGUF metadata;\n  2. the HF `application/vnd.ollama.image.template` blob;\n  3. the local `ollama show --modelfile` output.\n\n\n\n### 1. Fetch the HF Ollama manifest\n\n\n    REPO=\"bartowski/google_gemma-4-26B-A4B-it-GGUF\"\n    TAG=\"IQ2_XXS\"\n\n    curl -sSf -L \\\n      -H \"Accept: application/vnd.docker.distribution.manifest.v2+json\" \\\n      \"https://hf.co/v2/${REPO}/manifests/${TAG}\" \\\n      | jq . > hf-v2-manifest.json\n\n\nHighlight this layer:\n\n\n    {\n      \"mediaType\": \"application/vnd.ollama.image.template\",\n      \"size\": 159\n    }\n\n\n### 2. Fetch the template blob\n\n\n    TEMPLATE_DIGEST=\"$(\n      jq -r '.layers[] | select(.mediaType==\"application/vnd.ollama.image.template\") | .digest' \\\n        hf-v2-manifest.json\n    )\"\n\n    curl -sSf -L \\\n      -H \"Accept: application/vnd.docker.distribution.manifest.v2+json\" \\\n      \"https://hf.co/v2/${REPO}/blobs/${TEMPLATE_DIGEST}\" \\\n      > hf-v2-template.txt\n\n    cat hf-v2-template.txt\n\n\nExpected problematic result:\n\n\n    {{ if .System }}<bos><|turn>system\n    {{ .System }}<turn|>\n    {{ end }}{{ if .Prompt }}<|turn>user\n    {{ .Prompt }}<turn|>\n    {{ end }}<|turn>model\n    {{ .Response }}<turn|>\n\n\n### 3. Show what Ollama imports\n\n\n    ollama pull \"hf.co/${REPO}:${TAG}\"\n\n    ollama show --modelfile \"hf.co/${REPO}:${TAG}\" \\\n      > ollama-show-modelfile.txt\n\n\nOllama documents `ollama show --modelfile` as the way to inspect the model’s Modelfile: Ollama Modelfile Reference.\n\nIf `ollama-show-modelfile.txt` contains the same simplified template, that proves Ollama is using the registry-served template.\n\n### 4. Inspect GGUF metadata\n\nFor example, with llama.cpp tooling:\n\n\n    python ./llama.cpp/gguf-py/scripts/gguf-dump.py \\\n      --no-tensors \\\n      ./google_gemma-4-26B-A4B-it-IQ2_XXS.gguf \\\n      > gguf-metadata.txt\n\n    grep -n \"tokenizer.chat_template\" gguf-metadata.txt\n\n\nllama.cpp’s template wiki says `llama_chat_apply_template()` uses the template stored in model metadata key `tokenizer.chat_template` by default and includes a Jinja parser called `minja`: llama.cpp template wiki.\n\n### 5. Summarize the proof\n\nSource | Result\n---|---\nGGUF metadata | full Gemma 4 `tokenizer.chat_template`\nHF `application/vnd.ollama.image.template` blob | short generic Go template\n`ollama show --modelfile` | same short generic Go template\n\nThat table makes the issue very clear.\n\n* * *\n\n## Behavior tests to run\n\nDo not test only `hello`. A generic template can pass trivial chat while failing important branches.\n\nTest cases that matter:\n\nTest | What it checks\n---|---\none-turn prompt | baseline behavior\nsystem prompt | system role rendering\nmulti-turn chat | history loop\nassistant-history turn | assistant/model role rendering\nthinking on/off | Gemma 4 thinking control\ntool declaration | tool schema serialization\ntool call | tool-call formatting\ntool response | tool-response formatting\nimage input | multimodal placeholder handling\nlong answer / stop leak | stop tokens and turn terminators\n\nThe simplified template may only pass the first one or two.\n\n* * *\n\n## Workarounds\n\n### Workaround 1: Use the official Ollama Gemma 4 model\n\nFor normal local usage, this is the safest workaround:\n\n\n    ollama pull gemma4:26b\n    ollama run gemma4:26b\n\n\nReason: the official Ollama path can use the dedicated Gemma 4 renderer: Ollama Gemma 4 renderer.\n\nDownside: you may not get the exact community quant you wanted.\n\n### Workaround 2: Use llama.cpp / vLLM / another direct GGUF path\n\nFor testing the exact GGUF quant, use a runtime path that can apply the embedded template more directly.\n\nExamples of relevant references:\n\n  * HF GGUF with llama.cpp\n  * llama.cpp template wiki\n  * vLLM Gemma 4 guide\n\n\n\nThis helps answer:\n\n\n    Is the quantized GGUF itself bad, or is the Ollama wrapper bad?\n\n\nIf the same GGUF behaves better through llama.cpp/vLLM with a correct template, that supports the wrapper/template diagnosis.\n\n### Workaround 3: Import the GGUF manually into Ollama\n\nYou can bypass the `hf.co/v2` registry path:\n\n\n    FROM /absolute/path/to/google_gemma-4-26B-A4B-it-IQ2_XXS.gguf\n\n\nThen:\n\n\n    ollama create gemma4-local -f Modelfile\n    ollama show --modelfile gemma4-local\n\n\nBut this is **not automatically a full fix**. Manual import bypasses the HF registry template blob, but you still need a correct Ollama template or renderer behavior.\n\n### Workaround 4: Add a repo-level `template` file, if a faithful Go template exists\n\nHF allows a repo-level `template` file for Ollama, but it must be a Go template, not Jinja: HF Ollama docs.\n\nThis may help for some models. For Gemma 4, be careful: a partial Go template can fix basic chat while still breaking tools, thinking, images, and parser behavior.\n\n### Workaround 5: Render the prompt yourself\n\nFor serious application testing, the most controlled workaround is:\n\n  1. use the HF tokenizer/processor;\n  2. apply the correct chat template yourself;\n  3. send the rendered prompt through a completion-style path;\n  4. manage stop tokens and parsing yourself.\n\n\n\nThis avoids trusting a runtime’s chat serializer.\n\n* * *\n\n## Where to report\n\n### Primary target: Hugging Face\n\nBest first target:\n\nhttps://github.com/huggingface/huggingface.js/issues\n\nRelevant package:\n\n\n    packages/ollama-utils\n\n\nWhy this target:\n\n  * HF documents that its Ollama path selects a template based on GGUF `tokenizer.chat_template`: HF Ollama docs.\n  * `@huggingface/ollama-utils` says it converts GGUF/Jinja chat templates to the Go format used by Ollama: README.\n  * The same README explicitly lists “the converted template is wrong” as a valid reason for adding a custom handler/test.\n\n\n\nSuggested issue title:\n\n\n    hf.co/v2 Ollama registry serves simplified template for Gemma 4 GGUF despite full tokenizer.chat_template in GGUF metadata\n\n\nAlternative title:\n\n\n    application/vnd.ollama.image.template loses Gemma 4 chat-template semantics\n\n\nPrecise framing:\n\n\n    The GGUF metadata appears correct. The problem appears between GGUF tokenizer.chat_template and the generated Ollama image template layer.\n\n\n### Secondary target: Ollama\n\nReport to Ollama if you can show that imported Gemma 4 GGUFs should use the built-in Gemma 4 renderer but do not, or that the static `TEMPLATE` mechanism cannot represent official Gemma 4 rendering.\n\nRelevant links:\n\n  * Ollama Gemma 4 renderer\n  * Ollama Jinja template support request\n  * Ollama Modelfile docs\n\n\n\n### Optional target: quantizer / repo maintainer\n\nOnly report to the quantizer if:\n\n  * the GGUF metadata is actually wrong;\n  * the repo has an explicit bad `template` file;\n  * or you want a repo-level workaround.\n\n\n\nSuggested wording:\n\n\n    The GGUF metadata appears to contain the full tokenizer.chat_template, but the HF Ollama registry path is serving a simplified application/vnd.ollama.image.template layer. If a faithful Gemma 4 Go template is available, adding a repo-level template file might work around the issue for Ollama users.\n\n\nThat avoids wrongly blaming the quantizer.\n\n* * *\n\n## Suggested issue body\n\n\n    ## Summary\n\n    For `bartowski/google_gemma-4-26B-A4B-it-GGUF:IQ2_XXS`, the GGUF metadata appears to contain the full Gemma 4 `tokenizer.chat_template`, but the HF Ollama registry endpoint serves a much shorter `application/vnd.ollama.image.template` layer.\n\n    When the model is pulled through Ollama using `hf.co/...`, the local Ollama Modelfile uses this simplified template.\n\n    This appears to lose important Gemma 4 chat-template semantics.\n\n    ## Affected model\n\n    - Repo: `bartowski/google_gemma-4-26B-A4B-it-GGUF`\n    - Tag: `IQ2_XXS`\n    - Possibly affects other Gemma 4 GGUF repos/tags.\n\n    ## Steps to reproduce\n\n    ```sh\n    REPO=\"bartowski/google_gemma-4-26B-A4B-it-GGUF\"\n    TAG=\"IQ2_XXS\"\n\n    curl -sSf -L \\\n      -H \"Accept: application/vnd.docker.distribution.manifest.v2+json\" \\\n      \"https://hf.co/v2/${REPO}/manifests/${TAG}\" \\\n      | jq .\n\n\nFind the layer:\n\n\n    application/vnd.ollama.image.template\n\n\nFetch it:\n\n\n    TEMPLATE_DIGEST=\"$(\n      curl -sSf -L \\\n        -H \"Accept: application/vnd.docker.distribution.manifest.v2+json\" \\\n        \"https://hf.co/v2/${REPO}/manifests/${TAG}\" \\\n      | jq -r '.layers[] | select(.mediaType==\"application/vnd.ollama.image.template\") | .digest'\n    )\"\n\n    curl -sSf -L \\\n      -H \"Accept: application/vnd.docker.distribution.manifest.v2+json\" \\\n      \"https://hf.co/v2/${REPO}/blobs/${TEMPLATE_DIGEST}\"\n\n\n## Actual behavior\n\nThe template blob is a short generic Go template:\n\n\n    {{ if .System }}<bos><|turn>system\n    {{ .System }}<turn|>\n    {{ end }}{{ if .Prompt }}<|turn>user\n    {{ .Prompt }}<turn|>\n    {{ end }}<|turn>model\n    {{ .Response }}<turn|>\n\n\n`ollama show --modelfile hf.co/${REPO}:${TAG}` shows the same simplified template locally.\n\n## Expected behavior\n\nThe generated Ollama template should either:\n\n  1. preserve the semantics of the GGUF `tokenizer.chat_template`;\n  2. use a Gemma 4-specific custom conversion/handler;\n  3. or fail/omit the generated template instead of serving a misleading simplified fallback.\n\n\n\n## Why this matters\n\nGemma 4 prompt rendering is not just simple system/user/model turns.\n\nOllama’s official Gemma 4 renderer handles BOS, system/developer messages, tools, thinking, tool calls, tool responses, image tags, and Gemma 4-specific delimiters:\n\ngithub.com/ollama/ollama\n\n#### model/renderers/gemma4.go\n\nmain\n\n\n    package renderers\n\n    import (\n    \t\"fmt\"\n    \t\"sort\"\n    \t\"strings\"\n\n    \t\"github.com/ollama/ollama/api\"\n    )\n\n    // Gemma4Renderer renders prompts using Gemma 4's chat format with\n    // <|turn>/<turn|> markers, <|\"|> string delimiters, and <|tool>/\n    // <|tool_call>/<|tool_response> tags for function calling.\n    type Gemma4Renderer struct {\n    \tuseImgTags          bool\n    \temptyBlockOnNothink bool\n    }\n\n    const (\n    \tg4Q = `<|\"|>` // Gemma 4 string delimiter\n\n\nThis file has been truncated. show original\n\nA simplified static template can make the model appear to run while silently degrading chat, tool, thinking, multimodal, or parser behavior.\n\n## Evidence to attach\n\n  * `hf-v2-manifest.json`\n  * `hf-v2-template.txt`\n  * `ollama-show-modelfile.txt`\n  * GGUF metadata snippet showing the full `tokenizer.chat_template`\n\n\n\n\n    ---\n\n    ## What maintainers could do\n\n    HF-side fixes:\n\n    - Add a Gemma 4-specific mapping/handler in the Ollama compatibility layer.\n    - Add tests comparing converted Go-template output against expected Gemma 4 rendering.\n    - Avoid silently serving a simplified fallback when conversion is incomplete.\n    - Document cases where a model’s Jinja template cannot be faithfully represented as an Ollama Go `TEMPLATE`.\n\n    Ollama-side fixes:\n\n    - Support Jinja chat templates directly.\n    - Let imported Gemma 4 GGUFs use equivalent Gemma 4 renderer behavior when metadata identifies the architecture.\n    - Expose clearer diagnostics when a model falls back to a generic template.\n\n    Quantizer/repo-side mitigations:\n\n    - Preserve `tokenizer.chat_template` in GGUF metadata.\n    - Document known Ollama limitations.\n    - Optionally add a repo-level Go `template` file if a faithful one exists.\n\n    ---\n\n    ## Final answer\n\n    So, answering your question directly:\n\n    **Is this a quantizer configuration error?**\n\n    Probably not, assuming the GGUF metadata really contains the full template.\n\n    **Is it a general HF issue?**\n\n    Most likely yes: more specifically, an issue in the HF Ollama registry / template conversion / template selection layer.\n\n    **Is Ollama also involved?**\n\n    Yes, structurally. Ollama uses Go templates for `TEMPLATE`, while HF/GGUF templates are generally Jinja-style. Official Gemma 4 in Ollama has a custom renderer, which imported HF GGUFs may not get.\n\n    **Why does the official Ollama model behave better?**\n\n    Because it can use Ollama’s dedicated Gemma 4 renderer. The HF `hf.co/v2` path appears to provide a generated static template layer instead.\n\n    **What should you do now?**\n\n    1. Use official Ollama Gemma 4 for normal local usage.\n    2. Use llama.cpp/vLLM/direct GGUF paths for testing the exact quant.\n    3. File a precise issue against HF’s Ollama compatibility layer.\n    4. Include the manifest, template blob, `ollama show --modelfile`, and GGUF metadata.\n    5. Do not primarily blame the quantizer unless the GGUF metadata itself is wrong.\n\n    The best one-sentence report is:\n\n    > The GGUF metadata contains the full Gemma 4 `tokenizer.chat_template`, but the HF `hf.co/v2` Ollama registry emits a simplified `application/vnd.ollama.image.template` layer, causing `ollama run hf.co/...` to use a template that does not preserve Gemma 4 chat-template semantics.",
  "title": "Ollama model registry provides wrong chat template"
}