External Publication

Gpt-realtime-1.5: text output mode broken when tools are enabled

OpenAI Developer Community February 25, 2026

I’ve been using gpt-realtime-1.5 for a couple of days now and ran into an interesting issue. When using output_modalities=[“audio”] , the model works great. But when I switch to output_modalities=[“text”] with tools enabled and rely on an external TTS, the performance drops significantly compared to gpt-realtime.

Issues I’m seeing in text-only mode:

Model wraps normal conversational responses in curly braces {} as if it’s outputting JSON
Function call arguments leak into the text output channel (the TTS literally tries to speak the function call JSON)
Internal control tokens leak into the output, e.g.: <|aesthetics_3|><|has_watermark|>
Ignores language instructions that gpt-realtime followed perfectly

None of these issues exist with gpt-realtime in the same configuration, or with gpt-realtime-1.5 in audio output mode. Seems specific to text mode + tools.

Discussion in the ATmosphere