Open-source Hinglish ASR fine-tuned from Qwen3-ASR-0.6B (15.85% WER on conversational test)
Hi all,
Open-sourcing Srota (श्रोत): a full-parameter fine-tune of Qwen/Qwen3-ASR-0.6B for Hinglish (Hindi-English code-switched) speech. Output stays in natural mixed script (मेरा favourite festival Diwali है) instead of collapsing into all-Devanagari transliteration like the base model does.
Most open ASR stacks either hallucinate on Hinglish or romanize-then-mangle the English words, so this is a noticeable gap given how India actually talks.
Recipe :
- Full fine-tune of ~780M params (0.6B Qwen3 LLM + ~180M AuT audio encoder + projector). No LoRA, no frozen layers.
- ~95h training: HiACC (5.24h conversational) + OpenSLR-104 / MUCS-2021 (89.86h tutorial), concatenated, no upsampling.
- Language-agnostic decoding (target prefix
language None<asr_text>...), following Polyglot-Lion / Toshniwal et al. 2018. - AdamW, LR 2e-5 linear schedule, warmup_ratio 0.02, effective batch 32, bf16 + FlashAttention 2.
- 2 epochs (~3,352 steps), 2× H100 via Modal, ~49 min wall-clock, ~$6.50.
Result : HiACC conversational test 24.73% → 15.85% WER. OpenSLR-104 tutorial test 50.66% → 35.06%. Two domain-specialist siblings also released: Srota-Conv (14.23% on HiACC) and Srota-Tutorial (32.83% on OpenSLR-104).
Links :
- Model: moorlee/qwen3-asr-0.6b-hinglish · Hugging Face
- Demo: Srota: Hinglish Speech Recognition - a Hugging Face Space by moorlee
Apache-2.0, commercial use fine. If you are building open-source meeting notes or voice dictation for Indian users, this should slot in cleanly. Feedback and edge cases very welcome.
Discussion in the ATmosphere