External Publication
Visit Post

The BPE pre-tokenizer was not recognized!

Hugging Face Forums [Unofficial] May 4, 2026
Source

Hi @John6666 @Lightcap I came across the issue below earlier with a previous transformers release. It seems it was fixed in later releases and manual editing is no longer needed. “tokenizer_class”: “Qwen2Tokenizer” is set automatically now.

However, as Qwen3.5-4B is multimodal and processor must be used instead of class, the processor class is showing as follows processor_class "Qwen3VLProcessor". I wonder if it needs to be set manually to “Qwen2Tokenizer”?!! I will do the testing today to check. thanks!

########################################################

IMPORTANT in order to be able to convert to GGUF

########################################################

This error usually happens when there is a mismatch between the transformers library and the way the tokenizer_config.json was saved, or if you are using an experimental Python version (like 3.14) where some compiled backends for tokenizers might not be fully stable yet.

The key issue is that your tokenizer_config.json likely contains “tokenizer_class”: “TokenizersBackend”, which isn’t a standard class name that AutoTokenizer recognizes. It expects a specific class like Qwen2Tokenizer or LlamaTokenizer.

  1. The Immediate Manual Fix

You need to manually edit the tokenizer_config.json file in your model folder.

Open tokenizer_config.json in a text editor.

Find the line: “tokenizer_class”: “TokenizersBackend”

Change it to the correct class for your model. Since you are working with Qwen, it should be: “tokenizer_class”: “Qwen2Tokenizer” (Note: Qwen3 and Qwen2.5 typically still use the Qwen2Tokenizer class in transformers).

Save the file and try your conversion again.

  1. Why did this happen?

Library Version: When saving a model using SFTTrainer or Peft, if the tokenizers library (the Rust-based backend) is ahead of the transformers library, it sometimes writes generic backend names into the config instead of the model-specific class.

Python 3.14: Since Python 3.14 is very new/experimental, some pre-compiled wheels for the tokenizers library might be falling back to generic behaviors. Ensure you’ve run pip install --upgrade transformers tokenizers to get the most recent compatibility patches.

Discussion in the ATmosphere

Loading comments...