External Publication
Visit Post

Persistent 0% prompt cache hits on GPT-5.5 with Auckland NZ Cloudflare 520s complicating every workaround

OpenAI Developer Community June 17, 2026
Source

I greatly appreciate the detailed response here and the effort put in. I’ll pass this to my agents and have them prepare comprehensive responses to each of these points and identify all of the possible mistakes we’re making.

That said, several of these I can answer myself directly:

Calls to different models: Sometimes, but this can be confidently ruled out as a contributing factor, as I’ve not been varying the models involved while doing any testing, and generally I stick with GPT 5.5 in the exact same configuration.

Calls with different service tier: Directly ruled out. I don’t modify this at any point.

Calls with different prompt cache key: For the same context window, I have experimented with both stable prompt cache keys as well as salting them when I get several cache misses in a row. Currently, I’m running with it strictly stable, and it hasn’t materially changed anything.

Calls past expiry (5-60 minutes): Current testing is being done with a 24-hour request and a prompt cache key, though my API calls are typically far faster than 5-60 minutes apart. They would usually be seconds to minutes apart. Even then, with this quick turnaround time, I still experience very quick regressions to 0% hit rates.

Calls with framework injections of text such as UUIDs: There is volatile content injected as a prepended block above the most recent user role message only. This can range from things like the time of day down to the second, or recent file modifications and other volatile contents. From everything I’ve researched and looked into, tail volatility should not be a problem here. Yet, it seems to be the primary contributor.

It may not be as overt as constantly injecting random UUIDs that change turn to turn, but the fingerprinting of it would be effectively the same.

Prompt IDs with variables, varying prompt ID versions: As far as I’m aware, I’m not using any system that matches this description. Is this one of the extended features of the Responses API? One of the walled garden features, perhaps?

Not passing and maintaining a full chat history: As far as my context windows exist turn to turn, the verbatim contents is that they are complete right from the very first message. That is to say, I’m not doing any kind of truncation or sliding window context management that would drop a whole chunk at the start of the context window.

I have comprehensive diff-based fingerprinting of the at-wire-flight-time request payloads turn-to-turn that show constant stability of the request payload shapes.

Varying or dropping encrypted reasoning The reasoning blocks are all entirely dropped. They never enter into the context window. The only content block types that I submit or deal with are text and very occasionally, image.

If the cache writes are happening based on the response before it is delivered to my application, including their expectation of the encrypted reasoning contents, then that may be a plausible explanation. But I have also attempted to test this, and it doesn’t seem to have any meaningful effect. It would be my expectation that they would do cache writes based on what I submit, not based on what they return.

phase in output being returned: I manually checked my wire logs, and at no point am I replaying any phase field data.

Responses with any kind of compaction: I have my own client-side compaction mechanism, which is indistinguishable from simply sending a different context window. So, I’m not using any API-provided black-box compaction. It is just simply a different payload after using it. It is infrequent and stable between API calls / submission turns.

Then the big one: Your actual API call, instructions + input is simply non-varying, only adding new inputs to a record of 100% fidelity. Could you just clarify what you mean by this one? Because if this is a specification of the narrow requirements of getting a cache hit, then I’m absolutely not satisfying that. However, up until about a week and a half ago, there was no problem with the high mutability of my context windows.

That’s really the thing here; my my context windows have always been volatile but all of a sudden my my ability to hit cache at all has dropped to zero.

I’ll continue experimenting and investigating and drop updates when I make any progress.

Discussion in the ATmosphere

Loading comments...