External Publication

How to handling Parallel Response Branches with the OpenAI Responses API

OpenAI Developer Community April 28, 2026

I am building an AI chatbot using the Response API to reply to customer messages. Each time a customer sends a message, the AI generates a new response as a reply. I use the previous_response_id mechanism to maintain conversation state — every time a new response is created, the system overwrites previous_response_id with the newly returned response_id for that customer, so future turns inherit the prior context.

Handling Rapid Consecutive Messages

Because I do not want any delay, the system must call the API immediately whenever a message arrives from the customer. As a result, when a customer sends another message before the previous response has finished generating, I merge the messages and fire a new parallel API call to produce a fresh response.

Example:

The customer sends message M1. The system fires an API call with previous_response_id = R0 → in progress → will return responseA.
Before responseA completes, the customer sends message M2. The system immediately fires a second API call with previous_response_id = R0 and input [M1, M2] → will return responseB.
The output of responseA is discarded and never sent to the customer.
Once responseB completes, the chatbot sends its output to the customer and saves previous_response_id = responseB.
Subsequent customer messages then form a chain: responseB → responseC → responseD → …

The Problem

In responseA, the AI invokes the function call create_order → an order is created in the database.
In responseB, the AI decides not to call create_order (perhaps because the merged content of M1 + M2 led to a different decision).
The conversation state now follows the branch R0 → B → C → … and completely bypasses responseA.
After several more turns, at some responseX (a descendant of B), the AI decides to call create_order. Since the conversation chain through B has no record of responseA, the model has no way to know that the order was already created. It calls create_order again → the order is duplicated.

Question

Is there a way to handle this scenario where parallel responses are created under the same parent?

Constraints

Due to specific business requirements, the chatbot must call the API to generate a reply immediately upon receiving a message — it cannot wait a few seconds to see whether the customer is going to send a follow-up.
Creating responses in parallel is mandatory when the customer sends messages in rapid succession.

Discussion in the ATmosphere