Prompt Caching Not Working for GPT-5.4-Nano
Let’s reach for some handy code using the Responses API and the Conversation ID to chat.
Testing gpt-5.4-mini, asking about big code, then a second question - I get a cache hit:
What’s the next feature a power user would want added to the application? [app] sent! [usage] in/cached: 90200/90112 | out/reasoning: 344/0
Now switch to gpt-5.4-mini, maybe remove the code attachment, and keep chatting…
Turn 3 - setup
– your message? – (+, or ‘/’ alone, to send. /help for more.) Don’t suggest things you can do - just answer questions or do tasks, okay? [app] No files are currently tracked.You can still ask the assistant to create a new file path. If it proposes a new untracked path, the app may prompt you to approve it. You can also use /browse or /add to track existing files. [app] [app] sent! [usage] in/cached: 1948/0 | out/reasoning: 29/0
Understood. From here on, I’ll only answer questions or directly carry out tasks you ask for—no unsolicited suggestions.
Turn 4 - test
– your message? – (+, or ‘/’ alone, to send. /help for more.) Make a bullet point list of available tools you can call, please. [app] sent! [usage] in/cached: 1997/1536 | out/reasoning: 42/0
functions.apply_patch— modify files in your workspace using a patch diff.multi_tool_use.parallel— run multiple developer-tool calls in parallel (wrapper for permitted tools).
I guess I’m lucky to get a cache the first time on that model also.
Another time around, let’s have it do something…
– your message? – (+, or ‘/’ alone, to send. /help for more.) Create a “hello world” Python file for me. [app] sent! [usage] in/cached: 2056/1536 | out/reasoning: 57/0 [patch] create_file: /hello_world.py [app] AI wants to create /hello_world.py - approve? [y/N] y [workspace] Tracking: /hello_world.py [result] completed: [Add] /hello_world.py (6 lines, 1 hunk) [usage] in/cached: 2212/0 | out/reasoning: 34/0
Created
/hello_world.pywith amain()function that prints “Hello, world!” and runs it when executed as a script.
Another cache hit. Then after the tool call, the context cache is expectedly broken by the created code dynamically inserted via instructions , also “working”.
Result - Somewhat Okay
Thus, I have success for some chatting with “gpt-5.4-nano”, when the beginning of input is identical, long enough a recurring input length, and chats are quick in succession.
You might use the APIs 24h retention parameter also, because the default cache time window doesn’t allow lots of time for a user to think.
Bad tool context by OpenAI
Then we see that on Responses, OpenAI is wasting your tokens by loading up a useless parallel tool wrapper. Me typing a response here was long enough waiting that when I revisit, no more cache hit.
– your message? – (+, or ‘/’ alone, to send. /help for more.) Is it permitted to sent multiple apply_patch in parallel via the multi_tool_use wrapper? [app] sent! [usage] in/cached: 2269/0 | out/reasoning: 53/0
No—it’s not permitted. The
multi_tool_use.parallelwrapper can only be used for multiple developer tools , butfunctions.apply_patchspecifically must not be called in parallel (it says not to call it concurrently with other tools).
Another quicker input again, a cache hit again and an AI unwilling to even try to use the unwanted pointless wrapper.
– your message? – (+, or ‘/’ alone, to send. /help for more.) Try it now to create two test python files at the same time, just to clarify success or failure. [app] sent! [usage] in/cached: 2349/1536 | out/reasoning: 49/0
I can’t do that. Running multiple
apply_patchcalls in parallel isn’t permitted in this environment (and I also can’t do it safely as a “clarification test” because it would violate the tooling rule).
Discussion in the ATmosphere