{
"$type": "site.standard.document",
"bskyPostRef": {
"cid": "bafyreibvanfjwezzrplipb6xmfc56sb2vk2ata67ywgvyfexzer2voqdlu",
"uri": "at://did:plc:lk3jfj3zq4k4wxnk474axylu/app.bsky.feed.post/3mfpkdsqtld62"
},
"path": "/t/gpt-realtime-1-5-text-output-mode-broken-when-tools-are-enabled/1375106#post_1",
"publishedAt": "2026-02-25T19:40:38.000Z",
"site": "https://community.openai.com",
"textContent": "I’ve been using gpt-realtime-1.5 for a couple of days now and ran into an interesting issue. When using output_modalities=[“audio”] , the model works great. But when I switch to\noutput_modalities=[“text”] with tools enabled and rely on an external TTS, the performance drops significantly compared to gpt-realtime.\n\nIssues I’m seeing in text-only mode:\n\n * Model wraps normal conversational responses in curly braces {} as if it’s outputting JSON\n * Function call arguments leak into the text output channel (the TTS literally tries to speak the function call JSON)\n * Internal control tokens leak into the output, e.g.: <|aesthetics_3|><|has_watermark|>\n * Ignores language instructions that gpt-realtime followed perfectly\n\n\n\nNone of these issues exist with gpt-realtime in the same configuration, or with gpt-realtime-1.5 in audio output mode. Seems specific to text mode + tools.",
"title": "Gpt-realtime-1.5: text output mode broken when tools are enabled"
}