External Publication

Fine-Tuning an SLM for a Low-Resource Language

Hugging Face Forums [Unofficial] June 5, 2026

Hello, I read your guide and learned several new things from it. One part that I found especially helpful was the section about checking whether a tokenizer supports my target language well.

After reading that, I tested the Qwen 3 0.6B tokenizer on Persian, and its performance was quite poor. I also tested Qwen 3.5 0.8B , which was better, but still not good enough for strong Persian support.

So I wanted to ask where can I find a base model that is truly strong for Persian?

Or can i Somehow fine-tune a Tokenizer for Persian?

But i found a way myself though, if i cant improve the Tokenizer, i can help it. maybe i try to make a normalizer first and then extend it more and more and measure changes in the Tokenized output.

Discussion in the ATmosphere