Big delays receiving output_audio_buffer.stopped event since Apr 18th
Since April 18th, and still happening as of April 26th, we’re intermittently seeing pretty big delays between the model finishing its spoken response and us receiving the output_audio_buffer.stopped websocket event.
Most responses are fine, but suddenly a 2-4s audio response ends up with a 10-12s time window between output_audio_buffer.started and output_audio_buffer.stopped; meaning a 6-10s delay between the model finishing speech and the “stopped” event arriving.
Since the interrupt_response setting doesn’t seem to have any effect, as flagged here months ago: https://community.openai.com/t/issues-with-realtime-turn-taking/1369161
We rely on switching turn_detection on/off with a session.update to avoid the model getting interrupted mid-speech. We disable turn detection after response.create, then re-enable it once we receive output_audio_buffer.stopped.
When received promptly (as it always had been up until Apr 18th), the setup works well. However, when delayed, turn detection stays off way longer than it should. Therefore, the user speaks but their input is not picked up. Some go “hello? hello?”, while most just stay silent and hang up after a while. A very bad experience.
Examples of big output_audio_buffer.started - output_audio_buffer.stopped windows below. Again, all these were concise 2-4s audio responses.
resp_DVzNkceePLK27xkZ33UjJ
resp_DY845PNK8S1rPRY4jYs3G
resp_DYBAMoHOppd5Cx9UmKOj4
resp_DYC3eWgge9qEhZulRXsXY
resp_DYofSJBkOPKPkcNVDdcbg
resp_DYofw8GNnuMhypBEo8rMy
resp_DYohNIYItCxsXWJtc3nX0
@OpenAI_Support , @Sean-Der could anyone look into this issue?
Thank you
Discussion in the ATmosphere