External Publication
Visit Post

Prompt Caching Not Working for GPT-5.4-Nano

OpenAI Developer Community May 24, 2026
Source

Let’s reach for some handy code using the Responses API and the Conversation ID to chat.

Testing gpt-5.4-mini, asking about big code, then a second question - I get a cache hit:

What’s the next feature a power user would want added to the application? [app] sent! [usage] in/cached: 90200/90112 | out/reasoning: 344/0

Now switch to gpt-5.4-mini, maybe remove the code attachment, and keep chatting…

Turn 3 - setup

– your message? – (+, or ‘/’ alone, to send. /help for more.) Don’t suggest things you can do - just answer questions or do tasks, okay? [app] No files are currently tracked.You can still ask the assistant to create a new file path. If it proposes a new untracked path, the app may prompt you to approve it. You can also use /browse or /add to track existing files. [app] [app] sent! [usage] in/cached: 1948/0 | out/reasoning: 29/0

Understood. From here on, I’ll only answer questions or directly carry out tasks you ask for—no unsolicited suggestions.

Turn 4 - test

– your message? – (+, or ‘/’ alone, to send. /help for more.) Make a bullet point list of available tools you can call, please. [app] sent! [usage] in/cached: 1997/1536 | out/reasoning: 42/0

  • functions.apply_patch — modify files in your workspace using a patch diff.
  • multi_tool_use.parallel — run multiple developer-tool calls in parallel (wrapper for permitted tools).

I guess I’m lucky to get a cache the first time on that model also.

Another time around, let’s have it do something…

– your message? – (+, or ‘/’ alone, to send. /help for more.) Create a “hello world” Python file for me. [app] sent! [usage] in/cached: 2056/1536 | out/reasoning: 57/0 [patch] create_file: /hello_world.py [app] AI wants to create /hello_world.py - approve? [y/N] y [workspace] Tracking: /hello_world.py [result] completed: [Add] /hello_world.py (6 lines, 1 hunk) [usage] in/cached: 2212/0 | out/reasoning: 34/0

Created /hello_world.py with a main() function that prints “Hello, world!” and runs it when executed as a script.

Another cache hit. Then after the tool call, the context cache is expectedly broken by the created code dynamically inserted via instructions , also “working”.

Result - Somewhat Okay

Thus, I have success for some chatting with “gpt-5.4-nano”, when the beginning of input is identical, long enough a recurring input length, and chats are quick in succession.

You might use the APIs 24h retention parameter also, because the default cache time window doesn’t allow lots of time for a user to think.


Bad tool context by OpenAI

Then we see that on Responses, OpenAI is wasting your tokens by loading up a useless parallel tool wrapper. Me typing a response here was long enough waiting that when I revisit, no more cache hit.

– your message? – (+, or ‘/’ alone, to send. /help for more.) Is it permitted to sent multiple apply_patch in parallel via the multi_tool_use wrapper? [app] sent! [usage] in/cached: 2269/0 | out/reasoning: 53/0

No—it’s not permitted. The multi_tool_use.parallel wrapper can only be used for multiple developer tools , but functions.apply_patch specifically must not be called in parallel (it says not to call it concurrently with other tools).

Another quicker input again, a cache hit again and an AI unwilling to even try to use the unwanted pointless wrapper.

– your message? – (+, or ‘/’ alone, to send. /help for more.) Try it now to create two test python files at the same time, just to clarify success or failure. [app] sent! [usage] in/cached: 2349/1536 | out/reasoning: 49/0

I can’t do that. Running multiple apply_patch calls in parallel isn’t permitted in this environment (and I also can’t do it safely as a “clarification test” because it would violate the tooling rule).

Discussion in the ATmosphere

Loading comments...