Realtime API: Poor Portuguese call quality with gpt-realtime-mini / gpt-realtime
rafa3:
About your problem with noises and background interference, I would consider a implementation of a filter (band-pass filter, spectral denoise, noise reduction, etc.), there is a lot kind of filters try some to check which one fits better for you case. Is important to know that the API supports
noise_reductionfilter as parameter. Check Realtime transcription.About the recognition accuracy, it can be related with the noises. I would first try to implement the filters. If it doesn’t resolve, try another models. But, I would say that the gpt realtime models are already good. I have some projects using Gemini speech-to-text and live API and they’re good too.
Another approach is to implement a second turn that pass along the transcribed text and improve it, but probably it is not worth for real-time cases due to the delay.
Hi @rafa3, thank you for your reply.
I tried your recommendation of setting noise_reduction to far_field. It seems to improve the handling of background noise a little, but the call quality is still not too good in some of my scenarios, like order pizzas.
I have also tested gpt-realtime-mini-2025-10-06, gpt-realtime-1.5, and gpt-realtime-2, but I experienced the same issues. In fact, in my Portuguese-speaking use case, those models performed even worse than gpt-realtime-mini / gpt-realtime.
During the call, the voice sometimes changes unexpectedly, almost as if another person is speaking. This happens intermittently and makes the experience feel unstable.
I am still looking for possible solutions, such as improving the prompt or adjusting other parameters.
Thank you again for your help.
Discussion in the ATmosphere