Gpt-5.4-nano appears to return zero prompt-cache hits despite >1024-token shared prefixes
We are seeing what looks like a prompt-caching issue specific to gpt-5.4-nano.
According to the OpenAI docs, Prompt Caching is automatic for recent models and should work for prompts that are >= 1024 tokens. The gpt-5.4-nano model page also lists cached input pricing ($0.02 / 1M), so we expected non-zero cached_tokens / cached input usage.
However, in our tests, gpt-5.4-nano consistently shows zero cache hits , even with long, highly repeated prefixes, while control models on the same gateways do show cache hits.
Model: gpt-5.4-nano
Repeated the same mood benchmark 3 times with the same long shared prefix
Average prompt input per request: 1212.95 tokens
Run 1: cached_prompt_input_tokens = 0, cache_hit_rate = 0.00%
Run 2: cached_prompt_input_tokens = 0, cache_hit_rate = 0.00%
Run 3: cached_prompt_input_tokens = 0, cache_hit_rate = 0.00%
So this does not look like a generic prompt-formatting issue on our side:
prompts are above 1024 tokens
shared prefixes are stable
the same gateways show caching for gpt-5-nano
only gpt-5.4-nano is consistently at 0 cached input in our runs
Is prompt caching intentionally disabled for gpt-5.4-nano, or is there a known issue with cache routing / cached-token reporting for this model?
Discussion in the ATmosphere