New Audio Model Snapshots in the Realtime-API
OpenAI Developer Community
May 26, 2026
I’m honestly torn. The 35% WER improvement and multilingual gains in 2025-12-15 are exactly what we need; we’re building a language-learning product that touches a lot of low-resource languages, so those gains really matter.
But our testing this week hit the truncation bug at ~20% on multi-sentence inputs (3 of 16 fixtures, en/es/fr/de). Same input, retried, returned complete audio. One failure was 3.9s of speech followed by 1.6s of trailing silence; the stream is ending early and getting zero-padded, HTTP 200, no error, no retry signal. A silent 1-in-5 failure rate that no amount of WER improvement compensates for.
The original report on this is from June 2025 — almost a year now. With 2025-03-20 shutting down July 23, those of us pinning the old snapshot have ~2 months before we’re forced onto a snapshot we can’t reliably ship on.
Honest, friendly question: could one of you just open a Codex session and ask it to fix this?
Discussion in the ATmosphere