Gpt-realtime-1.5: text output mode broken when tools are enabled
OpenAI Developer Community
February 25, 2026
I’ve been using gpt-realtime-1.5 for a couple of days now and ran into an interesting issue. When using output_modalities=[“audio”] , the model works great. But when I switch to output_modalities=[“text”] with tools enabled and rely on an external TTS, the performance drops significantly compared to gpt-realtime.
Issues I’m seeing in text-only mode:
- Model wraps normal conversational responses in curly braces {} as if it’s outputting JSON
- Function call arguments leak into the text output channel (the TTS literally tries to speak the function call JSON)
- Internal control tokens leak into the output, e.g.: <|aesthetics_3|><|has_watermark|>
- Ignores language instructions that gpt-realtime followed perfectly
None of these issues exist with gpt-realtime in the same configuration, or with gpt-realtime-1.5 in audio output mode. Seems specific to text mode + tools.
Discussion in the ATmosphere