How to handling Parallel Response Branches with the OpenAI Responses API
I am building an AI chatbot using the Response API to reply to customer messages. Each time a customer sends a message, the AI generates a new response as a reply. I use the previous_response_id mechanism to maintain conversation state — every time a new response is created, the system overwrites previous_response_id with the newly returned response_id for that customer, so future turns inherit the prior context.
Handling Rapid Consecutive Messages
Because I do not want any delay, the system must call the API immediately whenever a message arrives from the customer. As a result, when a customer sends another message before the previous response has finished generating, I merge the messages and fire a new parallel API call to produce a fresh response.
Example:
- The customer sends message M1. The system fires an API call with previous_response_id = R0 → in progress → will return responseA.
- Before responseA completes, the customer sends message M2. The system immediately fires a second API call with previous_response_id = R0 and input [M1, M2] → will return responseB.
- The output of responseA is discarded and never sent to the customer.
- Once responseB completes, the chatbot sends its output to the customer and saves previous_response_id = responseB.
- Subsequent customer messages then form a chain: responseB → responseC → responseD → …
The Problem
- In responseA, the AI invokes the function call create_order → an order is created in the database.
- In responseB, the AI decides not to call create_order (perhaps because the merged content of M1 + M2 led to a different decision).
- The conversation state now follows the branch R0 → B → C → … and completely bypasses responseA.
- After several more turns, at some responseX (a descendant of B), the AI decides to call create_order. Since the conversation chain through B has no record of responseA, the model has no way to know that the order was already created. It calls create_order again → the order is duplicated.
Question
Is there a way to handle this scenario where parallel responses are created under the same parent?
Constraints
- Due to specific business requirements, the chatbot must call the API to generate a reply immediately upon receiving a message — it cannot wait a few seconds to see whether the customer is going to send a follow-up.
- Creating responses in parallel is mandatory when the customer sends messages in rapid succession.
Discussion in the ATmosphere