{
"$type": "site.standard.document",
"bskyPostRef": {
"cid": "bafyreidbmdzesngw272yhrjmkvs3bq5gxf7zyi5d7sv4lz5guofoilmvau",
"uri": "at://did:plc:lk3jfj3zq4k4wxnk474axylu/app.bsky.feed.post/3mllla246g6g2"
},
"path": "/t/realtime-api-instruction-limit-16-384-tokens-is-too-low-for-production-voice-agents-with-tool-calling/1378932#post_5",
"publishedAt": "2026-05-11T13:34:43.000Z",
"site": "https://community.openai.com",
"textContent": "Following up now that gpt-realtime-2 is available.\n\nI understand the earlier explanation that, for gpt-realtime-1.5, instructions/tools compete within the overall 32k Realtime context budget. The new question is about gpt-realtime-2 specifically. Its model page lists a 128,000 token context window and 32,000 max output tokens, so we tested whether the separate Realtime session instructions+tools ceiling also increased.\n\nUsing POST /v1/realtime/calls with the same WebRTC session payload shape, we observed:\n\n * gpt-realtime-1.5: ~17.8k estimated instructions+tools tokens succeeds; ~31.3k fails with HTTP 504\n * gpt-realtime-2: ~17.8k estimated instructions+tools tokens succeeds; ~31.3k fails with HTTP 504\n\n\n\nThese are local tiktoken estimates, so I do not expect them to match OpenAI’s internal realtime tokenizer exactly. The relevant observation is that both models appear to fail at roughly the same initial session payload size, before any conversation history, audio, or tool-call outputs exist.\n\nCould OpenAI clarify the intended behavior?\n\n 1. Is the Realtime `session.instructions + tools` limit currently fixed for all realtime models, including gpt-realtime-2?\n\n 2. If yes, is that intentional? If so, it would be helpful for the gpt-realtime-2 / Realtime docs to state clearly that the 128k context window does not imply a larger initial session instructions+tools budget.\n\n 3. If no, is the current `/v1/realtime/calls` behavior a bug, rollout issue, or endpoint-specific limitation?\n\n 4. When the initial session config is too large, could the API return a clear 4xx validation error instead of a 504? That would make this much easier to diagnose in production.\n\n\n\n\nHappy to provide request IDs and a minimal repro script if useful.",
"title": "Realtime API instruction limit (16,384 tokens) is too low for production voice agents with tool calling"
}