Realtime API: Poor Portuguese call quality with gpt-realtime-mini / gpt-realtime
About your problem with noises and background interference, I would consider a implementation of a filter (band-pass filter, spectral denoise, noise reduction, etc.), there is a lot kind of filters try some to check which one fits better for you case. Is important to know that the API supports noise_reduction filter as parameter. Check Realtime transcription.
About the recognition accuracy, it can be related with the noises. I would first try to implement the filters. If it doesn’t resolve, try another models. But, I would say that the gpt realtime models are already good. I have some projects using Gemini speech-to-text and live API and they’re good too.
Another approach is to implement a second turn that pass along the transcribed text and improve it, but probably it is not worth for real-time cases due to the delay.
Discussion in the ATmosphere