External Publication

Realtime API instruction limit (16,384 tokens) is too low for production voice agents with tool calling

OpenAI Developer Community April 15, 2026

Totally get why that feels tight especially when you’re aiming for 99% reliability on a voice agent. That last 10% gap is usually where things get tricky. What’s going on here is the instruction limit you’re hitting is part of the overall context budget. With gpt-realtime-1.5, you actually get a 32k total context window, but about 4k is reserved for output, so the remaining space is shared across everything: * system instructions * conversation history * tool calls and metadata * audio + text inputs So even if instructions feel capped, it’s really the combined load that’s squeezing things. A few folks building similar setups have had better results by: * trimming instructions down to only “always needed” rules * moving less critical logic into tools instead of the prompt * aggressively managing conversation history (summarize or reset when possible) Not ideal, agreed, but it does help stretch that space and improve consistency. There’s also a solid guide on model optimization that walks through ways to rebalance this for voice agents. Worth a look if you haven’t already. Also, you might wanna check the tools usage to see where the tokens are getting eaten. -Mark G.

Discussion in the ATmosphere