{
"$type": "site.standard.document",
"bskyPostRef": {
"cid": "bafyreibmg7dvz5odtrmdjpxmscmfjhgfvihpz5l6wgcldnxnt6qbof64ba",
"uri": "at://did:plc:lk3jfj3zq4k4wxnk474axylu/app.bsky.feed.post/3mlpetfze2yy2"
},
"path": "/t/realtime-sip-caller-hangup-still-drops-final-input-transcription-events/1380663#post_3",
"publishedAt": "2026-05-13T01:53:17.000Z",
"site": "https://community.openai.com",
"textContent": "We can reproduce this fairly easily.\n\nReproduction condition:\n\n 1. Start a Realtime SIP call through Twilio to OpenAI SIP.\n 2. Let the assistant finish its prompt.\n 3. The caller starts speaking a relatively long utterance, around 15-30 seconds, without leaving enough silence for VAD to finalize the turn.\n 4. The caller hangs up immediately after finishing the utterance, or while the utterance is still being finalized.\n 5. The Twilio recording contains the caller’s final speech, but the Realtime event stream does not emit the final input transcription events for that speech.\n\n\n\nIn our observed case:\n\n * OpenAI Realtime Call_ID: rtc_u2_DeAOG3pcpBFvJ4Mv1fkY7\n * OpenAI webhook event id: evt_6a013a9835dc8190a117e2426dad6903\n * OpenAI webhook id: wh_6a013a9841248190b12513da3ccd7788\n * Twilio parent CallSid: CAa17b0544dd4ed2584c9b8f2a5803e5c8\n * Twilio SIP child CallSid: CAc79d8fb428da42ad1b9234742fa5f053\n * SIP Call-ID: 0f62ec63e2269af3709ce2b92685312e@0.0.0.0\n * Internal request id in our app: 58cd683f-b34e-490f-bbbe-74fe8ca48af6\n\n\n\nWe do not have the OpenAI API x-request-id for this historical call because we were not logging the response headers from the `realtime.calls.accept` request at the time.\n\nTimeline:\n\n * The assistant finished speaking at around 2026-05-11 02:11:03 UTC.\n * The SIP leg completed at around 2026-05-11 02:11:31 UTC.\n * The Realtime session disconnected at around 2026-05-11 02:11:32 UTC.\n * The final caller speech exists in the Twilio recording.\n * We did not receive `conversation.item.input_audio_transcription.delta` or `conversation.item.input_audio_transcription.completed` for that final speech.\n\n\n\nOur current setup listens for:\n\n * `conversation.item.input_audio_transcription.delta`\n * `conversation.item.input_audio_transcription.completed`\n\n\n\nWe currently do not log:\n\n * `input_audio_buffer.committed`\n * `input_audio_buffer.speech_stopped`\n\n\n\nSo we cannot yet confirm whether `input_audio_buffer.committed` was emitted before the disconnect for this historical call. We are planning to add logging for those events.\n\nThe important pattern seems to be: caller speaks for a long enough time, then immediately disconnects before VAD has finalized/committed the input audio buffer.\n\nCould you confirm whether Realtime SIP is expected to commit and emit a final transcription for any pending input audio when the SIP caller sends BYE / hangs up? Or is this currently a known limitation where applications should always use the recording as the fallback source of truth for the final utterance?",
"title": "Realtime SIP caller hangup still drops final input transcription events"
}