{
  "$type": "site.standard.document",
  "bskyPostRef": {
    "cid": "bafyreia5bwlhevnbgh546hrxa3ybkngj5paoxovoclpnhmkdzith3htbf4",
    "uri": "at://did:plc:lk3jfj3zq4k4wxnk474axylu/app.bsky.feed.post/3mildmlbawl42"
  },
  "path": "/t/gpt-5-4-nano-appears-to-return-zero-prompt-cache-hits-despite-1024-token-shared-prefixes/1378432#post_1",
  "publishedAt": "2026-04-03T08:29:56.000Z",
  "site": "https://community.openai.com",
  "textContent": "We are seeing what looks like a prompt-caching issue specific to gpt-5.4-nano.\n\nAccording to the OpenAI docs, Prompt Caching is automatic for recent models and should work for prompts that are >= 1024 tokens. The gpt-5.4-nano model page also lists cached input pricing ($0.02 / 1M), so we expected non-zero cached_tokens / cached input usage.\n\nHowever, in our tests, gpt-5.4-nano consistently shows **zero cache hits** , even with long, highly repeated prefixes, while control models on the same gateways do show cache hits.\n\n  * Model: gpt-5.4-nano\n\n  * Repeated the same mood benchmark 3 times with the same long shared prefix\n\n  * Average prompt input per request: 1212.95 tokens\n\n  * Run 1: cached_prompt_input_tokens = 0, cache_hit_rate = 0.00%\n\n  * Run 2: cached_prompt_input_tokens = 0, cache_hit_rate = 0.00%\n\n  * Run 3: cached_prompt_input_tokens = 0, cache_hit_rate = 0.00%\n\n\n\n\nSo this does not look like a generic prompt-formatting issue on our side:\n\n  * prompts are above 1024 tokens\n\n  * shared prefixes are stable\n\n  * the same gateways show caching for gpt-5-nano\n\n  * only gpt-5.4-nano is consistently at 0 cached input in our runs\n\n\n\n\nIs prompt caching intentionally disabled for gpt-5.4-nano, or is there a known issue with cache routing / cached-token reporting for this model?",
  "title": "Gpt-5.4-nano appears to return zero prompt-cache hits despite >1024-token shared prefixes"
}