Handling Overlapping Responses in Realtime API When Tools Take Too Long
In summary, I implemented the following flow:
Call the tool if it fires.
Send a model preamble (e.g. “Let me check that for you”).
Wait 3 seconds for the tool response.
If the tool times out, return a tool response like this:
{
"error": "TIMEOUT",
"retries_remaining": 2,
"next_action": "retry"
}
Then, send a prompt instructing the model to retry the same tool call with the same parameters (forcing the model to call the same tool again) and provide a preamble first.
Set a second timeout of 4 seconds.
Apply the same logic as in step 4.
Set a final timeout of 5 seconds.
Apply the same logic as in step 4, but since there are no retries left, return a tool response indicating the timeout error and send a prompt without tools, forcing the model to communicate the error to the user.
In this flow, if the user speaks while a tool execution is still in progress, I let the current execution finish. If the tool response arrives successfully, I append it to the conversation, and the model responds using both the tool response and the user’s latest message as context.
If the tool execution times out, I append a timeout error for that execution, and the model responds to the user’s latest message instead.
Discussion in the ATmosphere