External Publication

Open-source Hinglish ASR fine-tuned from Qwen3-ASR-0.6B (15.85% WER on conversational test)

Hugging Face Forums [Unofficial] May 30, 2026

Hi all,

Open-sourcing Srota (श्रोत): a full-parameter fine-tune of Qwen/Qwen3-ASR-0.6B for Hinglish (Hindi-English code-switched) speech. Output stays in natural mixed script (मेरा favourite festival Diwali है) instead of collapsing into all-Devanagari transliteration like the base model does.

Most open ASR stacks either hallucinate on Hinglish or romanize-then-mangle the English words, so this is a noticeable gap given how India actually talks.

Recipe :

Full fine-tune of ~780M params (0.6B Qwen3 LLM + ~180M AuT audio encoder + projector). No LoRA, no frozen layers.
~95h training: HiACC (5.24h conversational) + OpenSLR-104 / MUCS-2021 (89.86h tutorial), concatenated, no upsampling.
Language-agnostic decoding (target prefix language None<asr_text>...), following Polyglot-Lion / Toshniwal et al. 2018.
AdamW, LR 2e-5 linear schedule, warmup_ratio 0.02, effective batch 32, bf16 + FlashAttention 2.
2 epochs (~3,352 steps), 2× H100 via Modal, ~49 min wall-clock, ~$6.50.

Result : HiACC conversational test 24.73% → 15.85% WER. OpenSLR-104 tutorial test 50.66% → 35.06%. Two domain-specialist siblings also released: Srota-Conv (14.23% on HiACC) and Srota-Tutorial (32.83% on OpenSLR-104).

Links :

Model: moorlee/qwen3-asr-0.6b-hinglish · Hugging Face
Demo: Srota: Hinglish Speech Recognition - a Hugging Face Space by moorlee

Apache-2.0, commercial use fine. If you are building open-source meeting notes or voice dictation for Indian users, this should slot in cleanly. Feedback and edge cases very welcome.

Discussion in the ATmosphere