Fine-Tuning an SLM for a Low-Resource Language
Hugging Face Forums [Unofficial]
June 5, 2026
Hello, I read your guide and learned several new things from it. One part that I found especially helpful was the section about checking whether a tokenizer supports my target language well.
After reading that, I tested the Qwen 3 0.6B tokenizer on Persian, and its performance was quite poor. I also tested Qwen 3.5 0.8B , which was better, but still not good enough for strong Persian support.
So I wanted to ask where can I find a base model that is truly strong for Persian?
Or can i Somehow fine-tune a Tokenizer for Persian?
But i found a way myself though, if i cant improve the Tokenizer, i can help it. maybe i try to make a normalizer first and then extend it more and more and measure changes in the Tokenized output.
Discussion in the ATmosphere