Realtime V2 is giving long responses
OpenAI Developer Community
May 23, 2026
Your only control surface to alter the model behavior is the system message you provide. You’ll need to communicate the type of responses that are expected to counter any seen symptoms.
A model ends its chat turn by producing a stop sequence. On chat models, this is a trained special token that is emitted. Different models have different qualities of predicting a stop token to end the message, versus predicting the next word, the next word of another sentence. Some recent models from OpenAI have been quite bad at this, where they simply can’t end, and even repeat the same response, starting over all by themselves or making a new message start without termination being output.
You can’t really communicate, " you produce a stop token which ends your turn", as the AI isn’t really self-aware that is what it is doing, and actually placing the special tokens by their string mapping is disallowed. What you can do is reinforce the style of conversation exchange. This might take the form, “a Bot assistant produces only one question, not a sequence of question sentences, and then the user replies to that single question in their own message.”
Responses and the realtime API don’t let you provide your own stop sequence that can be trained or instructed, unlike Chat Completions, which is for developers that understand AI instead of consumers that want an over-featured product to be used only one designed way.
This reinforces that when post-training the weights of a model on its message format, especially those models with different modalities, the right amount of reinforcement of predicting stop sequences is important because
Discussion in the ATmosphere