{
  "$type": "site.standard.document",
  "bskyPostRef": {
    "cid": "bafyreicipmxp6irgi67ikbgtgtrjjypabwdrthrnwbeizz6gb7fjbbbdze",
    "uri": "at://did:plc:lk3jfj3zq4k4wxnk474axylu/app.bsky.feed.post/3mmn7kgsifip2"
  },
  "path": "/t/prompt-caching-not-working-for-gpt-5-4-nano/1381702#post_2",
  "publishedAt": "2026-05-24T22:52:09.000Z",
  "site": "https://community.openai.com",
  "textContent": "Let’s reach for some handy code using the Responses API and the Conversation ID to chat.\n\nTesting gpt-5.4-mini, asking about big code, then a second question - I get a cache hit:\n\n> What’s the next feature a power user would want added to the application?\n>  [app] sent!\n>  [usage] in/cached: 90200/90112 | out/reasoning: 344/0\n\nNow switch to gpt-5.4-mini, maybe remove the code attachment, and keep chatting…\n\n#### Turn 3 - setup\n\n> – your message? – (+, or ‘/’ alone, to send. /help for more.)\n>  Don’t suggest things you can do - just answer questions or do tasks, okay?\n>  [app] No files are currently tracked.You can still ask the assistant to create a new file path. If it proposes a new untracked path, the app may prompt you to approve it. You can also use /browse or /add to track existing files.\n>  [app]\n>  [app] sent!\n>  [usage] in/cached: 1948/0 | out/reasoning: 29/0\n>\n> Understood. From here on, I’ll only answer questions or directly carry out tasks you ask for—no unsolicited suggestions.\n\n#### Turn 4 - test\n\n> – your message? – (+, or ‘/’ alone, to send. /help for more.)\n>  Make a bullet point list of available tools you can call, please.\n>  [app] sent!\n>  [usage] in/cached: 1997/1536 | out/reasoning: 42/0\n>\n>   * `functions.apply_patch` — modify files in your workspace using a patch diff.\n>   * `multi_tool_use.parallel` — run multiple developer-tool calls in parallel (wrapper for permitted tools).\n>\n\n\nI guess I’m lucky to get a cache the first time on that model also.\n\nAnother time around, let’s have it do something…\n\n> – your message? – (+, or ‘/’ alone, to send. /help for more.)\n>  Create a “hello world” Python file for me.\n>  [app] sent!\n>  [usage] in/cached: 2056/1536 | out/reasoning: 57/0\n>  [patch] create_file: /hello_world.py\n>  [app] AI wants to create /hello_world.py - approve? [y/N] y\n>  [workspace] Tracking: /hello_world.py\n>  [result] completed: [Add] /hello_world.py (6 lines, 1 hunk)\n>  [usage] in/cached: 2212/0 | out/reasoning: 34/0\n>\n> Created `/hello_world.py` with a `main()` function that prints **“Hello, world!”** and runs it when executed as a script.\n\nAnother cache hit. Then after the tool call, the context cache is expectedly broken by the created code dynamically inserted via _instructions_ , also “working”.\n\n## Result - Somewhat Okay\n\nThus, I have success for some chatting with “gpt-5.4-nano”, when the beginning of input is identical, long enough a recurring input length, and chats are quick in succession.\n\nYou might use the APIs 24h retention parameter also, because the default cache time window doesn’t allow lots of time for a user to think.\n\n* * *\n\n## Bad tool context by OpenAI\n\nThen we see that on Responses, OpenAI is wasting your tokens by loading up a useless parallel tool wrapper. Me typing a response here was long enough waiting that when I revisit, no more cache hit.\n\n> – your message? – (+, or ‘/’ alone, to send. /help for more.)\n>  Is it permitted to sent multiple apply_patch in parallel via the multi_tool_use wrapper?\n>  [app] sent!\n>  [usage] in/cached: 2269/0 | out/reasoning: 53/0\n>\n> No—it’s not permitted. The `multi_tool_use.parallel` wrapper can only be used for _multiple developer tools_ , but `functions.apply_patch` specifically must not be called in parallel (it says not to call it concurrently with other tools).\n\nAnother quicker input again, a cache hit again and an AI unwilling to even try to use the unwanted pointless wrapper.\n\n> – your message? – (+, or ‘/’ alone, to send. /help for more.)\n>  Try it now to create two test python files at the same time, just to clarify success or failure.\n>  [app] sent!\n>  [usage] in/cached: 2349/1536 | out/reasoning: 49/0\n>\n> I can’t do that. Running multiple `apply_patch` calls in parallel isn’t permitted in this environment (and I also can’t do it safely as a “clarification test” because it would violate the tooling rule).",
  "title": "Prompt Caching Not Working for GPT-5.4-Nano"
}