{
"$type": "site.standard.document",
"bskyPostRef": {
"cid": "bafyreia5bwlhevnbgh546hrxa3ybkngj5paoxovoclpnhmkdzith3htbf4",
"uri": "at://did:plc:lk3jfj3zq4k4wxnk474axylu/app.bsky.feed.post/3mildmlbawl42"
},
"path": "/t/gpt-5-4-nano-appears-to-return-zero-prompt-cache-hits-despite-1024-token-shared-prefixes/1378432#post_1",
"publishedAt": "2026-04-03T08:29:56.000Z",
"site": "https://community.openai.com",
"textContent": "We are seeing what looks like a prompt-caching issue specific to gpt-5.4-nano.\n\nAccording to the OpenAI docs, Prompt Caching is automatic for recent models and should work for prompts that are >= 1024 tokens. The gpt-5.4-nano model page also lists cached input pricing ($0.02 / 1M), so we expected non-zero cached_tokens / cached input usage.\n\nHowever, in our tests, gpt-5.4-nano consistently shows **zero cache hits** , even with long, highly repeated prefixes, while control models on the same gateways do show cache hits.\n\n * Model: gpt-5.4-nano\n\n * Repeated the same mood benchmark 3 times with the same long shared prefix\n\n * Average prompt input per request: 1212.95 tokens\n\n * Run 1: cached_prompt_input_tokens = 0, cache_hit_rate = 0.00%\n\n * Run 2: cached_prompt_input_tokens = 0, cache_hit_rate = 0.00%\n\n * Run 3: cached_prompt_input_tokens = 0, cache_hit_rate = 0.00%\n\n\n\n\nSo this does not look like a generic prompt-formatting issue on our side:\n\n * prompts are above 1024 tokens\n\n * shared prefixes are stable\n\n * the same gateways show caching for gpt-5-nano\n\n * only gpt-5.4-nano is consistently at 0 cached input in our runs\n\n\n\n\nIs prompt caching intentionally disabled for gpt-5.4-nano, or is there a known issue with cache routing / cached-token reporting for this model?",
"title": "Gpt-5.4-nano appears to return zero prompt-cache hits despite >1024-token shared prefixes"
}