{
  "$type": "site.standard.document",
  "bskyPostRef": {
    "cid": "bafyreidvd2mzjq7upq6ucuguznr2l2vrpmulvwqfvyuqfi7pwtpl77boxy",
    "uri": "at://did:plc:pgryn3ephfd2xgft23qokfzt/app.bsky.feed.post/3mkwlq5jjj4j2"
  },
  "path": "/t/the-bpe-pre-tokenizer-was-not-recognized/175714#post_3",
  "publishedAt": "2026-05-03T06:00:56.000Z",
  "site": "https://discuss.huggingface.co",
  "tags": [
    "convert_hf_to_gguf_update.py",
    "Hugging Face llama.cpp integration docs",
    "PreTokenizer",
    "Hugging Face Tokenizers components",
    "Hugging Face Tokenizers pipeline",
    "Hugging Face LLM course: Byte-Pair Encoding",
    "Transformers tokenizer summary",
    "Qwen3_5ForCausalLM is not supported",
    "convert_hf_to_gguf.py does not support the text version of Qwen3.5",
    "Qwen/Qwen3.5-4B",
    "Qwen/Qwen3.5-4B/tree/main",
    "Transformers model API: resizing token embeddings",
    "PEFT LoRA docs, including token-related options",
    "Unsloth Qwen3.5 docs",
    "llama.cpp - Qwen",
    "Qwen llama.cpp quantization guide",
    "Hugging Face GGUF docs",
    "convert_hf_to_gguf.py does not support text Qwen3.5",
    "WARNING: The BPE pre-tokenizer was not recognized",
    "BPE pre-tokenizer not recognized for several models",
    "Hugging Face LLM course: BPE"
  ],
  "textContent": "> I’d first check the tokenizer files tbh.\n\nTrue…\n\nHmm, in the simplest case, it’s possible that you called `model.save_pretrained()` but forgot to call `tokenizer.save_pretrained()`. This would result in a model folder missing only the tokenizer-related files, which would explain the symptoms. If you don’t get any warnings when converting the official Qwen 3.5, then this or a similar issue is likely the culprit.\n\nHowever, since there are cases where Qwen 3.5 and GGUF do not work as expected, I think it’s best to suspect an issue specific to the Qwen 3.5 series. If converting models from other series to GGUF works fine, then a specific issue with this series is likely the cause.\n\n* * *\n\n## `convert_hf_to_gguf.py`: “The BPE pre-tokenizer was not recognized” after fine-tuning Qwen3.5-4B\n\nI would treat this as a **tokenizer / GGUF metadata compatibility problem** , not primarily a `transformers` problem.\n\nYour traceback gets all the way to the vocabulary/tokenizer phase:\n\n\n    prepare_metadata()\n      set_vocab()\n        _set_vocab_gpt2()\n          get_vocab_base()\n            get_vocab_base_pre(tokenizer)\n              raise NotImplementedError(...)\n\n\nThat means `convert_hf_to_gguf.py` has already started processing the model and is now trying to encode the tokenizer contract into GGUF metadata. The failure happens because llama.cpp cannot recognize the **BPE pre-tokenizer** behavior loaded from your model folder.\n\nThe key line is:\n\n\n    chkhsh: 1444df51289cfa8063b96f0e62b1125440111bc79a52003ea14b6eac7016fd5f\n\n\nThat `chkhsh` is a tokenizer fingerprint. llama.cpp’s convert_hf_to_gguf_update.py generates hash-to-pre-tokenizer mappings for `get_vocab_base_pre()`. If your tokenizer produces a hash that is not in that mapping, the converter refuses to guess.\n\nSo the short version is:\n\n> The model weights may be fine. The folder you are converting contains a tokenizer configuration that your llama.cpp converter cannot map to a known GGUF pre-tokenizer type.\n\n* * *\n\n## Why upgrading `transformers` did not fix it\n\nUpgrading `transformers` helps when Transformers cannot load a model, config, or tokenizer. But this failure is inside llama.cpp’s conversion code.\n\nThe Hugging Face llama.cpp integration docs describe the conversion process as roughly:\n\n  1. load `config.json` with `AutoConfig`,\n  2. load tokenizer information with `AutoTokenizer`,\n  3. select a converter class from the model architecture,\n  4. map tensors,\n  5. write GGUF weights, tokenizer metadata, and model metadata.\n\n\n\nYour failure happens after the tokenizer is loaded, when llama.cpp tries to classify the tokenizer’s BPE pre-tokenization behavior.\n\nSo this is not simply:\n\n\n    Transformers is too old.\n\n\nIt is more like:\n\n\n    llama.cpp does not recognize the tokenizer behavior in <model_dir>.\n\n\nThat is also why running `convert_hf_to_gguf_update.py` may not help automatically. That script is mainly a converter-maintenance tool: it regenerates known pre-tokenizer hashes from models listed in the script. It does not magically repair a local fine-tuned folder whose tokenizer files changed or are incomplete.\n\nRelevant source: convert_hf_to_gguf_update.py.\n\n* * *\n\n## Background: what a “BPE pre-tokenizer” is\n\nA tokenizer is not only a vocabulary file.\n\nA simplified Hugging Face tokenizer pipeline is:\n\n\n    raw text\n      -> normalizer\n      -> pre-tokenizer\n      -> BPE model / merges\n      -> post-processor / special-token handling\n      -> token IDs\n\n\nThe Hugging Face Tokenizers docs describe the PreTokenizer as the component that splits text before the tokenizer model applies BPE/WordPiece/Unigram rules.\n\nThis matters because two tokenizers can have:\n\n  * the same vocabulary size,\n  * the same model architecture,\n  * similar-looking special tokens,\n\n\n\nbut still produce different token IDs if the pre-tokenizer differs.\n\nExamples where pre-tokenization differences can matter:\n\n\n    \"Hello world\"\n    \" Hello world\"\n    \"Hello\\nworld\"\n    \"你好,世界\"\n    \"こんにちは世界\"\n    \"🙂🚀 café naïve\"\n    \"<|im_start|>user\\nHello<|im_end|>\\n<|im_start|>assistant\\n\"\n\n\nIf llama.cpp wrote the wrong `tokenizer.ggml.pre` metadata, the resulting GGUF could load but tokenize prompts differently from Transformers. That can cause bad output, broken Unicode handling, broken chat markers, or high perplexity. So llama.cpp stops instead of guessing.\n\nGood background references:\n\n  * Hugging Face Tokenizers components\n  * Hugging Face Tokenizers pipeline\n  * Hugging Face LLM course: Byte-Pair Encoding\n  * Transformers tokenizer summary\n\n\n\n* * *\n\n## Why Qwen3.5 makes this easier to hit\n\nQwen3.5 support in llama.cpp is relatively recent and commit-sensitive.\n\nThere are recent llama.cpp issues around Qwen3.5 conversion support, including `Qwen3_5ForCausalLM` not being supported in some converter paths:\n\n  * Qwen3_5ForCausalLM is not supported\n  * convert_hf_to_gguf.py does not support the text version of Qwen3.5\n\n\n\nYour error is not exactly the same as those architecture errors, because your traceback reaches tokenizer handling. But the lesson is still important:\n\n> “Qwen-ish support exists” does not necessarily mean “my exact Qwen3.5 variant, my exact tokenizer files, and my exact llama.cpp commit are supported.”\n\nAlso, the official Qwen/Qwen3.5-4B repo contains several important tokenizer/config/processor files. The file list includes things like:\n\n  * `tokenizer.json`\n  * `tokenizer_config.json`\n  * `vocab.json`\n  * `merges.txt`\n  * `chat_template.jinja`\n  * `preprocessor_config.json`\n  * `video_preprocessor_config.json`\n  * `config.json`\n\n\n\nSee the repo file listing here: Qwen/Qwen3.5-4B/tree/main.\n\nFor Qwen3.5, I would treat tokenizer and processor files as part of the model contract, not as disposable side files.\n\n* * *\n\n## Most likely causes, ranked\n\n### 1. Your fine-tuned or merged folder has tokenizer drift\n\nThis is the most likely case if:\n\n  * the original base model converts with the same llama.cpp commit,\n  * your fine-tuned/merged folder fails,\n  * you did not intentionally add tokens,\n  * your training/export tool saved or regenerated tokenizer files.\n\n\n\nTokenizer drift means that files such as these differ from the base model:\n\n\n    tokenizer.json\n    tokenizer_config.json\n    vocab.json\n    merges.txt\n    special_tokens_map.json\n    added_tokens.json\n    chat_template.jinja\n    preprocessor_config.json\n    video_preprocessor_config.json\n\n\nThis can happen even if you never manually edited tokenizer files. Fine-tuning tools often call `save_pretrained()`, copy partial artifacts, rewrite `tokenizer_config.json`, alter chat templates, or omit files that the original base repo had.\n\nIf the tokenizer was not intentionally changed during training, the safest practical fix is often to copy the tokenizer-related files from the exact base model revision back into the merged folder.\n\n* * *\n\n### 2. Your llama.cpp checkout is too old for the exact Qwen3.5 path\n\nIf the **original base model** also fails with the same kind of error, then your fine-tune is probably not the main issue.\n\nIn that case, update llama.cpp itself, not just Python packages:\n\n\n    cd <llama_cpp_dir>\n    git pull --rebase\n    python -m pip install -U -r requirements.txt\n    python convert_hf_to_gguf_update.py\n\n\nThen retry converting the base model.\n\nQwen3.5-related converter support has changed recently, so the exact `llama.cpp` commit matters.\n\n* * *\n\n### 3. You added or changed tokens during fine-tuning\n\nIf your training code did anything like:\n\n\n    tokenizer.add_tokens(...)\n    tokenizer.add_special_tokens(...)\n    model.resize_token_embeddings(len(tokenizer))\n\n\nthen copying base tokenizer files can be wrong.\n\nWhy? Because the model’s embedding matrix may now contain rows for new token IDs. If you overwrite the tokenizer with the base tokenizer, token IDs and embedding rows can disagree.\n\nIn that case, first verify:\n\n\n    len(tokenizer) == config.vocab_size == embedding rows\n\n\nIf these do not match, fix the merged Transformers folder before trying GGUF conversion.\n\nRelevant background:\n\n  * Transformers model API: resizing token embeddings\n  * PEFT LoRA docs, including token-related options\n\n\n\n* * *\n\n### 4. You are mixing Qwen3.5 base / instruct / text-only / multimodal artifacts\n\nQwen3.5-4B is not just an old-style plain text-only layout. Some Qwen3.5 workflows involve multimodal files, processor configs, chat templates, or separate projector handling.\n\nBe careful not to mix files from:\n\n\n    Qwen/Qwen3.5-4B\n    Qwen/Qwen3.5-4B-Base\n    an Unsloth Qwen3.5 repo\n    a text-only derivative\n    a LoRA adapter folder\n    a merged full model folder\n    a GGUF repo\n\n\nUse tokenizer files from the **exact model and revision you trained from** , not from a “nearby” Qwen model.\n\nUseful references:\n\n  * Qwen/Qwen3.5-4B\n  * Qwen/Qwen3.5-4B/tree/main\n  * Unsloth Qwen3.5 docs\n\n\n\n* * *\n\n## The decisive diagnostic test\n\nBefore editing anything, test the original base model with the same llama.cpp commit.\n\n### Step 1: record your environment\n\n\n    cd <llama_cpp_dir>\n    git rev-parse HEAD\n    python --version\n    python -m pip show transformers tokenizers huggingface_hub gguf sentencepiece protobuf\n\n\nAlso record:\n\n\n    base model: <base_model_name>\n    base revision: <base_model_revision_or_unknown>\n    fine-tuning method: <lora_qlora_full_finetune>\n    merged folder: <merged_model_dir>\n    did you add tokens: <yes_or_no>\n    did you change chat_template: <yes_or_no>\n    target: <text_only_or_multimodal>\n\n\n### Step 2: download the exact base model\n\nIf the base was `Qwen/Qwen3.5-4B`:\n\n\n    hf download Qwen/Qwen3.5-4B \\\n      --local-dir <base_model_dir> \\\n      --include \"*.safetensors\" \\\n      --include \"*.json\" \\\n      --include \"*.txt\" \\\n      --include \"*.jinja\"\n\n\nIf you know the exact revision you trained from, pin it:\n\n\n    hf download Qwen/Qwen3.5-4B \\\n      --revision <base_model_revision> \\\n      --local-dir <base_model_dir> \\\n      --include \"*.safetensors\" \\\n      --include \"*.json\" \\\n      --include \"*.txt\" \\\n      --include \"*.jinja\"\n\n\n### Step 3: try converting the base model\n\n\n    python <llama_cpp_dir>/convert_hf_to_gguf.py \\\n      <base_model_dir> \\\n      --outtype bf16 \\\n      --outfile <base_model_dir>/base-bf16.gguf\n\n\nUse BF16/F16 for debugging. Do not make your first target a 4-bit quant.\n\nThe normal Qwen flow is:\n\n\n    Transformers folder -> high-precision GGUF -> quantized GGUF\n\n\nSee the official Qwen llama.cpp quantization guide: llama.cpp - Qwen.\n\n### How to interpret the result\n\nResult | Meaning\n---|---\nBase model converts | llama.cpp probably supports the base tokenizer. Your fine-tuned/merged folder likely drifted.\nBase model fails with the same BPE pre-tokenizer hash | Your llama.cpp checkout probably does not support that exact tokenizer state.\nBase model fails with architecture error | You likely need newer llama.cpp Qwen3.5 architecture support.\nBase converts, fine-tuned model fails | Compare and probably restore tokenizer files, unless you added tokens.\n\nThis test is the most important one.\n\n* * *\n\n## Compare tokenizer files\n\nRun this against the base folder and your merged/fine-tuned folder:\n\n\n    from pathlib import Path\n    import hashlib\n\n    base = Path(\"<base_model_dir>\")\n    ft = Path(\"<merged_model_dir>\")\n\n    files = [\n        \"tokenizer.json\",\n        \"tokenizer_config.json\",\n        \"vocab.json\",\n        \"merges.txt\",\n        \"chat_template.jinja\",\n        \"special_tokens_map.json\",\n        \"added_tokens.json\",\n        \"config.json\",\n        \"processor_config.json\",\n        \"preprocessor_config.json\",\n        \"video_preprocessor_config.json\",\n    ]\n\n    def sha(p):\n        if not p.exists():\n            return \"MISSING\"\n        return hashlib.sha256(p.read_bytes()).hexdigest()\n\n    for name in files:\n        b = sha(base / name)\n        f = sha(ft / name)\n        print(f\"{name:32} {'same' if b == f else 'DIFF'}\")\n        print(f\"  base: {b}\")\n        print(f\"  ft:   {f}\")\n\n\nSuspicious results if you did **not** add tokens:\n\n\n    tokenizer.json                  DIFF\n    tokenizer_config.json           DIFF\n    vocab.json                      MISSING\n    merges.txt                      MISSING\n    added_tokens.json               added or changed\n    special_tokens_map.json         changed\n    chat_template.jinja             missing or changed\n    processor/preprocessor files    missing\n\n\n* * *\n\n## Check whether tokenization actually changed\n\nHashes are useful, but direct token-ID comparison is even more concrete.\n\n\n    from transformers import AutoTokenizer\n\n    base_tok = AutoTokenizer.from_pretrained(\"<base_model_dir>\", trust_remote_code=True)\n    ft_tok = AutoTokenizer.from_pretrained(\"<merged_model_dir>\", trust_remote_code=True)\n\n    tests = [\n        \"Hello world\",\n        \" Hello world\",\n        \"Hello\\nworld\",\n        \"a  b   c\",\n        \"你好,世界\",\n        \"こんにちは世界\",\n        \"🙂🚀 café naïve\",\n        \"<|im_start|>user\\nHello<|im_end|>\\n<|im_start|>assistant\\n\",\n        \"def f(x):\\n    return x + 1\",\n    ]\n\n    for s in tests:\n        b = base_tok.encode(s, add_special_tokens=False)\n        f = ft_tok.encode(s, add_special_tokens=False)\n\n        print(\"\\nTEXT:\", repr(s))\n        print(\"same:\", b == f)\n\n        if b != f:\n            print(\"base:\", b[:100])\n            print(\"ft:  \", f[:100])\n\n\nIf these differ and you did not intentionally change the tokenizer, that strongly points to tokenizer drift.\n\n* * *\n\n## If you did not add tokens: likely fix\n\nIf all of this is true:\n\n  * base model converts,\n  * fine-tuned/merged model fails,\n  * you did not add tokens,\n  * tokenizer files differ,\n\n\n\nthen copy tokenizer/config support files from the exact base model revision into your merged folder.\n\n\n    cp <base_model_dir>/tokenizer.json <merged_model_dir>/\n    cp <base_model_dir>/tokenizer_config.json <merged_model_dir>/\n    cp <base_model_dir>/vocab.json <merged_model_dir>/\n    cp <base_model_dir>/merges.txt <merged_model_dir>/\n    cp <base_model_dir>/chat_template.jinja <merged_model_dir>/\n\n    cp <base_model_dir>/special_tokens_map.json <merged_model_dir>/ 2>/dev/null || true\n    cp <base_model_dir>/added_tokens.json <merged_model_dir>/ 2>/dev/null || true\n    cp <base_model_dir>/processor_config.json <merged_model_dir>/ 2>/dev/null || true\n    cp <base_model_dir>/preprocessor_config.json <merged_model_dir>/ 2>/dev/null || true\n    cp <base_model_dir>/video_preprocessor_config.json <merged_model_dir>/ 2>/dev/null || true\n\n\nThen rerun conversion:\n\n\n    python <llama_cpp_dir>/convert_hf_to_gguf.py \\\n      <merged_model_dir> \\\n      --outtype bf16 \\\n      --outfile <output_bf16_gguf>\n\n\nAfter that succeeds, quantize:\n\n\n    <llama_cpp_dir>/build/bin/llama-quantize \\\n      <output_bf16_gguf> \\\n      <output_q4_k_m_gguf> \\\n      Q4_K_M\n\n\nThis is the fix I would try first in your case, assuming no tokens were added.\n\n* * *\n\n## If you added tokens: do not copy blindly\n\nIf you added tokens, check tokenizer/model consistency first:\n\n\n    from transformers import AutoTokenizer, AutoConfig, AutoModelForCausalLM\n\n    path = \"<merged_model_dir>\"\n\n    tok = AutoTokenizer.from_pretrained(path, trust_remote_code=True)\n    cfg = AutoConfig.from_pretrained(path, trust_remote_code=True)\n\n    print(\"len(tokenizer):\", len(tok))\n    print(\"config.vocab_size:\", getattr(cfg, \"vocab_size\", None))\n    print(\"added vocab size:\", len(tok.get_added_vocab()))\n    print(\"added vocab:\", tok.get_added_vocab())\n\n    model = AutoModelForCausalLM.from_pretrained(\n        path,\n        torch_dtype=\"auto\",\n        device_map=\"cpu\",\n        trust_remote_code=True,\n    )\n\n    print(\"embedding rows:\", model.get_input_embeddings().weight.shape[0])\n\n    if model.get_output_embeddings() is not None:\n        print(\"output rows:\", model.get_output_embeddings().weight.shape[0])\n\n\nYou want:\n\n\n    len(tokenizer) == config.vocab_size == embedding rows\n\n\nIf that does not hold, fix the merged Transformers model first.\n\nIf the tokenizer is intentionally modified and internally consistent, then llama.cpp may genuinely need support for that tokenizer fingerprint. In that case, copying the base tokenizer would hide the real issue and may break the model.\n\n* * *\n\n## If the original base model also fails\n\nIf the base model fails too, stop debugging the fine-tuned folder. Use a fresh current llama.cpp checkout:\n\n\n    git clone https://github.com/ggml-org/llama.cpp <llama_cpp_clean_dir>\n    cd <llama_cpp_clean_dir>\n\n    python -m pip install -U -r requirements.txt\n\n    cmake -S . -B build -DCMAKE_BUILD_TYPE=Release\n    cmake --build build --config Release\n\n    python convert_hf_to_gguf_update.py\n\n\nThen retry:\n\n\n    python <llama_cpp_clean_dir>/convert_hf_to_gguf.py \\\n      <base_model_dir> \\\n      --outtype bf16 \\\n      --outfile <base_model_dir>/base-bf16.gguf\n\n\nIf it still fails with the same `chkhsh`, then it is probably an upstream llama.cpp support issue for that exact tokenizer/model revision.\n\nA good report should include:\n\n\n    base model: <base_model_name>\n    base revision: <base_model_revision>\n    fine-tuned model: <fine_tuned_model_or_local_only>\n    llama.cpp commit: <commit_hash>\n    python version: <python_version>\n    transformers version: <transformers_version>\n    tokenizers version: <tokenizers_version>\n    did you add tokens: <yes_or_no>\n    did you change chat_template: <yes_or_no>\n    did you merge LoRA: <yes_or_no>\n    target: <text_only_or_multimodal>\n    full converter command: <command>\n    chkhsh: 1444df51289cfa8063b96f0e62b1125440111bc79a52003ea14b6eac7016fd5f\n\n\nAlso include hashes:\n\n\n    sha256sum \\\n      <merged_model_dir>/tokenizer.json \\\n      <merged_model_dir>/tokenizer_config.json \\\n      <merged_model_dir>/vocab.json \\\n      <merged_model_dir>/merges.txt \\\n      <merged_model_dir>/special_tokens_map.json \\\n      <merged_model_dir>/added_tokens.json \\\n      <merged_model_dir>/chat_template.jinja \\\n      2>/dev/null\n\n\n* * *\n\n## Why manual hash patching is risky\n\nYou may be tempted to edit `convert_hf_to_gguf.py` and add something like:\n\n\n    if chkhsh == \"1444df51289cfa8063b96f0e62b1125440111bc79a52003ea14b6eac7016fd5f\":\n        res = \"qwen35\"\n\n\nor:\n\n\n    res = \"qwen2\"\n\n\nI would not do that as the first fix.\n\nThe hash is only a fingerprint. The actual GGUF needs a correct `tokenizer.ggml.pre` value that llama.cpp can reproduce at runtime. If you map the hash to the wrong pre-tokenizer, the conversion may succeed but inference can be subtly broken.\n\nThis is worse than a clean failure.\n\nOnly consider a manual mapping if you can prove:\n\n  1. the base tokenizer and fine-tuned tokenizer encode a broad set of test strings identically,\n  2. the tokenizer JSON pre-tokenizer is equivalent to an existing llama.cpp pre-tokenizer,\n  3. llama.cpp runtime tokenizer code supports that behavior,\n  4. generated text and/or perplexity look sane after conversion.\n\n\n\nRelevant source: convert_hf_to_gguf_update.py.\n\n* * *\n\n## Conversion and quantization order\n\nDo not debug this by jumping straight to `Q4_K_M`.\n\nUse the standard two-step route:\n\n\n    Transformers model folder\n      -> high-precision GGUF: BF16/F16/F32\n      -> quantized GGUF: Q4_K_M, Q5_K_M, Q8_0, etc.\n\n\nFor Qwen models, the Qwen docs show converting first, often with `--outtype bf16`, then quantizing with `llama-quantize`. See: Qwen llama.cpp quantization guide.\n\nExample:\n\n\n    python <llama_cpp_dir>/convert_hf_to_gguf.py \\\n      <merged_model_dir> \\\n      --outtype bf16 \\\n      --outfile <model_bf16_gguf>\n\n    <llama_cpp_dir>/build/bin/llama-quantize \\\n      <model_bf16_gguf> \\\n      <model_q4_k_m_gguf> \\\n      Q4_K_M\n\n\nIf quality matters, consider an importance matrix later, but only after the BF16/F16 GGUF conversion works.\n\n* * *\n\n## Related issues and references\n\nUseful references for this class of problem:\n\n  * convert_hf_to_gguf_update.py — the relevant pre-tokenizer hash/update logic.\n  * Hugging Face llama.cpp integration docs — explains the HF-to-GGUF conversion path.\n  * Hugging Face GGUF docs — format-level background.\n  * Qwen llama.cpp quantization guide — Qwen-specific convert/quantize/evaluate flow.\n  * Qwen/Qwen3.5-4B — official model repo.\n  * Qwen/Qwen3.5-4B/tree/main — file list to compare against.\n  * Qwen3_5ForCausalLM is not supported — Qwen3.5 architecture support context.\n  * convert_hf_to_gguf.py does not support text Qwen3.5 — Qwen3.5 text conversion context.\n  * WARNING: The BPE pre-tokenizer was not recognized — same warning pattern with `chkhsh`.\n  * BPE pre-tokenizer not recognized for several models — shows this is a general converter compatibility class, not Qwen-only.\n  * Hugging Face Tokenizers components — pre-tokenizer background.\n  * Hugging Face Tokenizers pipeline — tokenizer pipeline background.\n  * Hugging Face LLM course: BPE — beginner-friendly BPE explanation.\n  * Transformers tokenizer summary — byte-level BPE and tokenizer concepts.\n  * Unsloth Qwen3.5 docs — Qwen3.5 runtime/frontend caveats.\n\n\n\n* * *\n\n## My best guess for your case\n\nGiven the exact traceback and the fact that upgrading `transformers` plus running `convert_hf_to_gguf_update.py` did not fix it, my best guess is:\n\n> Your fine-tuned/merged Qwen3.5-4B folder has tokenizer drift or missing tokenizer-side files.\n\nThe fix I would try first is:\n\n  1. Convert the original base model with the same llama.cpp commit.\n  2. If the base converts, compare tokenizer files.\n  3. If you did not add tokens, copy the exact base tokenizer/config/processor files into the merged folder.\n  4. Convert to BF16/F16 GGUF.\n  5. Quantize only after conversion succeeds.\n\n\n\nIf the original base model also fails, then this is probably not your fine-tune. It is more likely a llama.cpp support issue for that exact Qwen3.5 tokenizer/model revision.\n\n* * *\n\n## Short checklist\n\n  * Record `llama.cpp` commit.\n  * Record `transformers`, `tokenizers`, `huggingface_hub`, and Python versions.\n  * Confirm the exact base model and revision.\n  * Confirm whether tokens were added.\n  * Try converting the original base model.\n  * Compare `tokenizer.json`, `tokenizer_config.json`, `vocab.json`, `merges.txt`, `chat_template.jinja`, and special-token files.\n  * If no tokens were added, restore tokenizer files from the exact base model revision.\n  * Convert to BF16/F16 GGUF first.\n  * Quantize to `Q4_K_M` only after the high-precision GGUF conversion succeeds.\n  * Do not manually map the hash unless tokenizer equivalence and runtime support are verified.\n\n",
  "title": "The BPE pre-tokenizer was not recognized!"
}