{
"$type": "site.standard.document",
"bskyPostRef": {
"cid": "bafyreie7eydy5d23wqetxe3qd7tsizsdhv3etg4i5r7wup5erfopzxfvjq",
"uri": "at://did:plc:lk3jfj3zq4k4wxnk474axylu/app.bsky.feed.post/3mjkm722e6gi2"
},
"path": "/t/realtime-api-instruction-limit-16-384-tokens-is-too-low-for-production-voice-agents-with-tool-calling/1378932#post_2",
"publishedAt": "2026-04-15T18:25:52.000Z",
"site": "https://community.openai.com",
"textContent": "Totally get why that feels tight especially when you’re aiming for 99% reliability on a voice agent. That last 10% gap is usually where things get tricky.\n\nWhat’s going on here is the instruction limit you’re hitting is part of the overall context budget. With gpt-realtime-1.5, you actually get a 32k total context window, but about 4k is reserved for output, so the remaining space is shared across everything:\n\n * system instructions\n * conversation history\n * tool calls and metadata\n * audio + text inputs\n\n\n\nSo even if instructions feel capped, it’s really the combined load that’s squeezing things.\n\nA few folks building similar setups have had better results by:\n\n * trimming instructions down to only “always needed” rules\n * moving less critical logic into tools instead of the prompt\n * aggressively managing conversation history (summarize or reset when possible)\n\n\n\nNot ideal, agreed, but it does help stretch that space and improve consistency.\n\nThere’s also a solid guide on model optimization that walks through ways to rebalance this for voice agents. Worth a look if you haven’t already.\n\nAlso, you might wanna check the tools usage to see where the tokens are getting eaten.\n\n-Mark G.",
"title": "Realtime API instruction limit (16,384 tokens) is too low for production voice agents with tool calling"
}