External Publication
Visit Post

Realtime SIP caller hangup still drops final input transcription events

OpenAI Developer Community May 13, 2026
Source

We can reproduce this fairly easily.

Reproduction condition:

  1. Start a Realtime SIP call through Twilio to OpenAI SIP.
  2. Let the assistant finish its prompt.
  3. The caller starts speaking a relatively long utterance, around 15-30 seconds, without leaving enough silence for VAD to finalize the turn.
  4. The caller hangs up immediately after finishing the utterance, or while the utterance is still being finalized.
  5. The Twilio recording contains the caller’s final speech, but the Realtime event stream does not emit the final input transcription events for that speech.

In our observed case:

  • OpenAI Realtime Call_ID: rtc_u2_DeAOG3pcpBFvJ4Mv1fkY7
  • OpenAI webhook event id: evt_6a013a9835dc8190a117e2426dad6903
  • OpenAI webhook id: wh_6a013a9841248190b12513da3ccd7788
  • Twilio parent CallSid: CAa17b0544dd4ed2584c9b8f2a5803e5c8
  • Twilio SIP child CallSid: CAc79d8fb428da42ad1b9234742fa5f053
  • SIP Call-ID: 0f62ec63e2269af3709ce2b92685312e@0.0.0.0
  • Internal request id in our app: 58cd683f-b34e-490f-bbbe-74fe8ca48af6

We do not have the OpenAI API x-request-id for this historical call because we were not logging the response headers from the realtime.calls.accept request at the time.

Timeline:

  • The assistant finished speaking at around 2026-05-11 02:11:03 UTC.
  • The SIP leg completed at around 2026-05-11 02:11:31 UTC.
  • The Realtime session disconnected at around 2026-05-11 02:11:32 UTC.
  • The final caller speech exists in the Twilio recording.
  • We did not receive conversation.item.input_audio_transcription.delta or conversation.item.input_audio_transcription.completed for that final speech.

Our current setup listens for:

  • conversation.item.input_audio_transcription.delta
  • conversation.item.input_audio_transcription.completed

We currently do not log:

  • input_audio_buffer.committed
  • input_audio_buffer.speech_stopped

So we cannot yet confirm whether input_audio_buffer.committed was emitted before the disconnect for this historical call. We are planning to add logging for those events.

The important pattern seems to be: caller speaks for a long enough time, then immediately disconnects before VAD has finalized/committed the input audio buffer.

Could you confirm whether Realtime SIP is expected to commit and emit a final transcription for any pending input audio when the SIP caller sends BYE / hangs up? Or is this currently a known limitation where applications should always use the recording as the fallback source of truth for the final utterance?

Discussion in the ATmosphere

Loading comments...