Raw Record Source

{
  "$type": "site.standard.document",
  "bskyPostRef": {
    "cid": "bafyreibeytcy2pgkun5roeu7wt44jjiehphogjafvov4wrm5mef3hysmwi",
    "uri": "at://did:plc:pgryn3ephfd2xgft23qokfzt/app.bsky.feed.post/3mkvdhaxixpo2"
  },
  "path": "/t/the-bpe-pre-tokenizer-was-not-recognized/175714#post_2",
  "publishedAt": "2026-05-02T18:08:45.000Z",
  "site": "https://discuss.huggingface.co",
  "textContent": "I’d first check the tokenizer files tbh. I don’t think upgrading `transformers` is the main thing here.\n\nFrom the traceback, the converter already reaches the vocab/tokenizer part, but llama.cpp does not recognize the pre-tokenizer config from your `tokenizer.json`.\n\nCan you try converting the original base model with the same llama.cpp commit? If the base model works but your fine-tuned/merged folder fails, then probably something changed in the tokenizer files.\n\nI’d compare `tokenizer.json`, `tokenizer_config.json`, `special_tokens_map.json`, and added tokens. If you didn’t add/change tokens during fine-tuning, try copying the tokenizer files from the base model into the merged folder and run the conversion again.\n\nAlso please share the exact base model name, llama.cpp commit, and whether you added any tokens. Without those, it is hard to say much more than guessing from the `chkhsh`.",
  "title": "The BPE pre-tokenizer was not recognized!"
}