Raw Record Source

{
  "$type": "site.standard.document",
  "bskyPostRef": {
    "cid": "bafyreihndcjpqnxrwwy6iz5tbkxqmsyatqjtxxj5hsxgpr3bvaizexq2fq",
    "uri": "at://did:plc:pgryn3ephfd2xgft23qokfzt/app.bsky.feed.post/3mkyvkxb2egy2"
  },
  "path": "/t/the-bpe-pre-tokenizer-was-not-recognized/175714#post_5",
  "publishedAt": "2026-05-04T03:48:14.000Z",
  "site": "https://discuss.huggingface.co",
  "tags": [
    "@John6666",
    "@Lightcap"
  ],
  "textContent": "Hi @John6666 @Lightcap I came across the issue below earlier with a previous transformers release. It seems it was fixed in later releases and manual editing is no longer needed. “tokenizer_class”: “Qwen2Tokenizer” is set automatically now.\n\nHowever, as Qwen3.5-4B is multimodal and processor must be used instead of class, the processor class is showing as follows **processor_class** \"Qwen3VLProcessor\". I wonder if it needs to be set manually to “Qwen2Tokenizer”?!! I will do the testing today to check. thanks!\n\n########################################################\n\n# IMPORTANT in order to be able to convert to GGUF\n\n########################################################\n\nThis error usually happens when there is a mismatch between the transformers library and the way the tokenizer_config.json was saved, or if you are using an experimental Python version (like 3.14) where some compiled backends for tokenizers might not be fully stable yet.\n\nThe key issue is that your tokenizer_config.json likely contains “tokenizer_class”: “TokenizersBackend”, which isn’t a standard class name that AutoTokenizer recognizes. It expects a specific class like Qwen2Tokenizer or LlamaTokenizer.\n\n  1. The Immediate Manual Fix\n\n\n\nYou need to manually edit the tokenizer_config.json file in your model folder.\n\nOpen tokenizer_config.json in a text editor.\n\nFind the line: “tokenizer_class”: “TokenizersBackend”\n\nChange it to the correct class for your model. Since you are working with Qwen, it should be:\n“tokenizer_class”: “Qwen2Tokenizer” (Note: Qwen3 and Qwen2.5 typically still use the Qwen2Tokenizer class in transformers).\n\nSave the file and try your conversion again.\n\n  3. Why did this happen?\n\n\n\nLibrary Version: When saving a model using SFTTrainer or Peft, if the tokenizers library (the Rust-based backend) is ahead of the transformers library, it sometimes writes generic backend names into the config instead of the model-specific class.\n\nPython 3.14: Since Python 3.14 is very new/experimental, some pre-compiled wheels for the tokenizers library might be falling back to generic behaviors. Ensure you’ve run pip install --upgrade transformers tokenizers to get the most recent compatibility patches.",
  "title": "The BPE pre-tokenizer was not recognized!"
}