External Publication

Realtime API: Poor Portuguese call quality with gpt-realtime-mini / gpt-realtime

OpenAI Developer Community May 28, 2026

Thanks for laying this out so clearly @leandro-ligmee, and good suggestion from @rafa3 on filtering/noise reduction.

This sounds less like one single model issue and more like the usual phone-call stack problem: speakerphone + background noise + μ-law audio + VAD deciding “that was enough speech” too early.

A few things I’d try before changing models:

Enable the Realtime input noise reduction option if you are not already using it.
Pre-process audio before SIP/Realtime if possible: band-pass for voice, noise suppression, AGC, echo cancellation.
Raise silence_duration_ms a bit more for Portuguese phone calls, since users may pause mid-address or mid-name.
Consider setting create_response: false and manually creating the response only after you’re confident the user finished. That can reduce “AI starts talking by itself” cases.
For addresses/names, don’t rely on one pass. Ask for confirmation: “Did you say Rua X?” or collect critical fields twice in a structured way.
Add domain hints in the transcription prompt, like expected city names, street formats, common Brazilian names, etc. The generic “maximum fidelity” instruction may not be enough.

I agree with @rafa3 that noise is probably the first thing to attack. If the mic input is messy, the transcription and VAD will both behave worse, even with better models.

Would be useful to know whether you’re seeing more false starts during silence/background noise, or mostly while the user is still speaking. Those usually need slightly different fixes.

-Mark G.

Discussion in the ATmosphere