Raw Record Source

{
  "$type": "site.standard.document",
  "bskyPostRef": {
    "cid": "bafyreia6ae5pwwvqrzfskwvycn27boztpzivs7mb6x3rki4xb54f23wv6q",
    "uri": "at://did:plc:pgryn3ephfd2xgft23qokfzt/app.bsky.feed.post/3mn3hfkzc2lh2"
  },
  "path": "/t/open-source-hinglish-asr-fine-tuned-from-qwen3-asr-0-6b-15-85-wer-on-conversational-test/176416#post_1",
  "publishedAt": "2026-05-30T15:24:20.000Z",
  "site": "https://discuss.huggingface.co",
  "tags": [
    "moorlee/qwen3-asr-0.6b-hinglish · Hugging Face",
    "Srota: Hinglish Speech Recognition - a Hugging Face Space by moorlee"
  ],
  "textContent": "Hi all,\n\nOpen-sourcing **Srota** (श्रोत): a full-parameter fine-tune of Qwen/Qwen3-ASR-0.6B for Hinglish (Hindi-English code-switched) speech. Output stays in natural mixed script (`मेरा favourite festival Diwali है`) instead of collapsing into all-Devanagari transliteration like the base model does.\n\nMost open ASR stacks either hallucinate on Hinglish or romanize-then-mangle the English words, so this is a noticeable gap given how India actually talks.\n\n**Recipe** :\n\n  * Full fine-tune of ~780M params (0.6B Qwen3 LLM + ~180M AuT audio encoder + projector). No LoRA, no frozen layers.\n  * ~95h training: HiACC (5.24h conversational) + OpenSLR-104 / MUCS-2021 (89.86h tutorial), concatenated, no upsampling.\n  * Language-agnostic decoding (target prefix `language None<asr_text>...`), following Polyglot-Lion / Toshniwal et al. 2018.\n  * AdamW, LR 2e-5 linear schedule, warmup_ratio 0.02, effective batch 32, bf16 + FlashAttention 2.\n  * 2 epochs (~3,352 steps), 2× H100 via Modal, ~49 min wall-clock, ~$6.50.\n\n\n\n**Result** : HiACC conversational test 24.73% → **15.85%** WER. OpenSLR-104 tutorial test 50.66% → **35.06%**. Two domain-specialist siblings also released: Srota-Conv (14.23% on HiACC) and Srota-Tutorial (32.83% on OpenSLR-104).\n\n**Links** :\n\n  * Model: moorlee/qwen3-asr-0.6b-hinglish · Hugging Face\n  * Demo: Srota: Hinglish Speech Recognition - a Hugging Face Space by moorlee\n\n\n\nApache-2.0, commercial use fine. If you are building open-source meeting notes or voice dictation for Indian users, this should slot in cleanly. Feedback and edge cases very welcome.",
  "title": "Open-source Hinglish ASR fine-tuned from Qwen3-ASR-0.6B (15.85% WER on conversational test)"
}