{
  "$type": "site.standard.document",
  "bskyPostRef": {
    "cid": "bafyreicqig3ozock3ssbkogc4gmwlyh5lxjznu2sdvltzsxazcfaoadpbi",
    "uri": "at://did:plc:lk3jfj3zq4k4wxnk474axylu/app.bsky.feed.post/3mjgft44rkvw2"
  },
  "path": "/t/realtime-api-instruction-limit-16-384-tokens-is-too-low-for-production-voice-agents-with-tool-calling/1378932#post_1",
  "publishedAt": "2026-04-14T01:38:27.000Z",
  "site": "https://community.openai.com",
  "textContent": "We’re running a production voice agent for restaurant order-taking using gpt-realtime-1.5 via LiveKit. The agent handles thousands of calls per month in Spanish (Mexico) with 9 function tools.\n\n**Problem 1: Instruction token limit is too low**\n\nWe discovered our system prompt was exceeding the instruction limit. The error from the API:\n\n“Instructions cannot be longer than 16384 tokens, you have provided 17199 tokens.”\n\nOur prompt needs to include: the full restaurant menu with prices and modifiers (~5,000 tokens), business rules, delivery location mappings, audio transcription corrections for Spanish, and returning customer context. All of this is necessary for correct behavior.\n\n16,384 tokens is not enough for a production voice agent with a real menu and real business logic. The worst part is that the failure was silent — we had no idea the prompt was being truncated. Calls with new customers (less context) worked fine. Calls with returning customers (name, last order, saved address loaded into context) would exceed the limit and the agent would lose critical rules from the end of the prompt. This caused inconsistent behavior that took us weeks to diagnose.\n\n**Problem 2: Tool calling is unreliable ~10% of the time**\n\nEven when the prompt fits within the limit, gpt-realtime-1.5 does not follow instructions consistently. Approximately 1 in 10 calls has errors:\n\n  * Paid modifiers placed in the “notes” text field instead of the “modifiers” parameter with extra_price, causing incorrect pricing. The customer gets undercharged because the system only calculates price from modifiers, not from notes. We have explicit instructions with examples saying “NEVER put paid extras in notes”, and it still does it.\n  * Duplicate items added when the customer interrupts mid-tool-call. The model fires add_to_order, gets interrupted, then fires it again. The internal state now has 2x of the same item, but the model’s conversational context only remembers 1. The customer hears one thing, the order has another.\n  * Ignoring explicit language rules. We say “always use Mexican pesos” and the model occasionally says “dollars”. We say “never offer fries as an add-on to a product that already IS fries” and it does it anyway.\n  * The model understands the conversation correctly (its reasoning is right) but executes the tool call wrong. It knows the customer wants a burger with fries, but it puts “con papas naturales” in notes instead of calling add_to_order with the correct modifier and extra_price.\n\n\n\nThe conversational ability and voice quality are excellent. The problem is strictly instruction following and tool calling reliability. For a voice agent that takes real orders with real money, 90% accuracy is not enough. We need 99%+.\n\n**Questions:**\n\n  1. Is there a way to increase the instruction limit beyond 16,384 tokens? We need at least 20,000-24,000.\n  2. Is there a roadmap for improving tool calling reliability in the Realtime API? The text models (gpt-4o, gpt-4.1) are significantly more reliable at structured function calling than the realtime models.\n  3. Is there any way to make tool calls atomic with respect to user interruptions? This is a fundamental issue for any voice agent that modifies state.\n  4. Should the API return a clear error or warning when instructions are truncated instead of failing silently?\n\n\n\nEnvironment: gpt-realtime-1.5, LiveKit Agents SDK, Spanish (Mexico), 9 function tools, ~2 min average call duration.",
  "title": "Realtime API instruction limit (16,384 tokens) is too low for production voice agents with tool calling"
}