{
  "$type": "site.standard.document",
  "bskyPostRef": {
    "cid": "bafyreie7eydy5d23wqetxe3qd7tsizsdhv3etg4i5r7wup5erfopzxfvjq",
    "uri": "at://did:plc:lk3jfj3zq4k4wxnk474axylu/app.bsky.feed.post/3mjkm722e6gi2"
  },
  "path": "/t/realtime-api-instruction-limit-16-384-tokens-is-too-low-for-production-voice-agents-with-tool-calling/1378932#post_2",
  "publishedAt": "2026-04-15T18:25:52.000Z",
  "site": "https://community.openai.com",
  "textContent": "Totally get why that feels tight especially when you’re aiming for 99% reliability on a voice agent. That last 10% gap is usually where things get tricky.\n\nWhat’s going on here is the instruction limit you’re hitting is part of the overall context budget. With gpt-realtime-1.5, you actually get a 32k total context window, but about 4k is reserved for output, so the remaining space is shared across everything:\n\n  * system instructions\n  * conversation history\n  * tool calls and metadata\n  * audio + text inputs\n\n\n\nSo even if instructions feel capped, it’s really the combined load that’s squeezing things.\n\nA few folks building similar setups have had better results by:\n\n  * trimming instructions down to only “always needed” rules\n  * moving less critical logic into tools instead of the prompt\n  * aggressively managing conversation history (summarize or reset when possible)\n\n\n\nNot ideal, agreed, but it does help stretch that space and improve consistency.\n\nThere’s also a solid guide on model optimization that walks through ways to rebalance this for voice agents. Worth a look if you haven’t already.\n\nAlso, you might wanna check the tools usage to see where the tokens are getting eaten.\n\n-Mark G.",
  "title": "Realtime API instruction limit (16,384 tokens) is too low for production voice agents with tool calling"
}