Raw Record Source

{
  "$type": "site.standard.document",
  "bskyPostRef": {
    "cid": "bafyreibvanfjwezzrplipb6xmfc56sb2vk2ata67ywgvyfexzer2voqdlu",
    "uri": "at://did:plc:lk3jfj3zq4k4wxnk474axylu/app.bsky.feed.post/3mfpkdsqtld62"
  },
  "path": "/t/gpt-realtime-1-5-text-output-mode-broken-when-tools-are-enabled/1375106#post_1",
  "publishedAt": "2026-02-25T19:40:38.000Z",
  "site": "https://community.openai.com",
  "textContent": "I’ve been using gpt-realtime-1.5 for a couple of days now and ran into an interesting issue. When using output_modalities=[“audio”] , the model works great. But when I switch to\noutput_modalities=[“text”] with tools enabled and rely on an external TTS, the performance drops significantly compared to gpt-realtime.\n\nIssues I’m seeing in text-only mode:\n\n  * Model wraps normal conversational responses in curly braces {} as if it’s outputting JSON\n  * Function call arguments leak into the text output channel (the TTS literally tries to speak the function call JSON)\n  * Internal control tokens leak into the output, e.g.: <|aesthetics_3|><|has_watermark|>\n  * Ignores language instructions that gpt-realtime followed perfectly\n\n\n\nNone of these issues exist with gpt-realtime in the same configuration, or with gpt-realtime-1.5 in audio output mode. Seems specific to text mode + tools.",
  "title": "Gpt-realtime-1.5: text output mode broken when tools are enabled"
}