Realtime SIP caller hangup still drops final input transcription events
Hi OpenAI team,
This may be related to the previously reported issue:
SIP: Caller hangup closes session WebSocket immediately, preventing event draining + early-hangup calls still accepted API
Summary When using the Realtime API over SIP, two related issues occur when the caller hangs up: If the caller ends the call after it’s connected, OpenAI immediately closes the session WebSocket. This prevents pending events (tool calls, transcription, output buffering) from being delivered, resulting in incomplete tool calls and partial transcriptions. If the caller hangs up while the call is still ringing, OpenAI still accepts the call and starts an agent session as if the call were act…
That thread appears to be closed/resolved, but we are still able to reproduce a very similar issue.
Environment:
- Realtime API over SIP
- Twilio inbound call → Twilio
<Dial><Sip>→ OpenAI SIP endpoint - OpenAI Agents SDK JS:
@openai/agents-realtime0.4.3 - Reproduced on: May 11, 2026
Reproduction steps:
- Start a SIP Realtime call through Twilio.
- Let the assistant say its prompt, for example: “Thank you for calling. Please tell us your preferred reservation date and time.”
- As the caller, speak continuously for around 15 to 25 seconds.
- Hang up immediately after speaking, from the caller side.
Observed behavior:
- Twilio recording contains the caller’s final 15 to 25 seconds of speech.
- The Realtime transport receives
disconnectedshortly after the call ends. - Before
disconnected, our application does not receive any of the following events for the caller’s final speech:conversation.item.input_audio_transcription.deltaconversation.item.input_audio_transcription.completedconversation.item.input_audio_transcription.failed
- Because no delta or completed event is delivered, the final caller utterance cannot be persisted in our conversation history.
Expected behavior:
- Before the SIP Realtime WebSocket is closed, pending input audio/transcription events should be drained.
- At minimum, if caller audio was received by the SIP session, we would expect either:
conversation.item.input_audio_transcription.completedconversation.item.input_audio_transcription.failed- or some explicit event indicating that final input audio was discarded because the caller hung up before VAD/commit.
This is particularly important for phone-agent use cases because callers often finish speaking and hang up immediately. In those cases, the recording is complete, but the Realtime conversation history is missing the final utterance.
Could you confirm whether the SIP caller-hangup event draining fix is expected to cover this case?
Also, is final input audio transcription guaranteed to be emitted before the Realtime SIP WebSocket closes when the caller hangs up immediately after speaking? If not, what is the recommended way to reliably capture the caller’s final utterance?
Discussion in the ATmosphere