GPT-5-mini image input token calculation discrepancy with official FAQ formula
Independently confirming this — I measured gpt-5-mini image input tokens directly off the API and get ~1.20 tokens/patch , matching the pricing calculator, not the docs’ 1.62.
Method: send the same text prompt with and without one image, then subtract the text-only usage.input_tokens from the image request’s — that isolates the image’s contribution. Patch count is ceil(w/32) × ceil(h/32).
| image | patches | image input tokens | tokens ÷ patch |
|---|---|---|---|
| 256×256 | 64 | 77 | 1.20 |
| 512×512 | 256 | 308 | 1.20 |
| 768×1024 | 768 | 922 | 1.20 |
| 1280×720 | 920 | 1104 | 1.20 |
| 1024×1024 | 1024 | 1229 | 1.20 |
| 2048×768 | 1536 (at cap) | 1844 | 1.20 |
So for everything up to and including the 1536-patch cap, the billed/reported tokens are ceil(w/32) × ceil(h/32) × 1.20 — the documented 1.62 over-states actual usage by ~35% (1.62 ÷ 1.20 ≈ 1.35).
One thing I haven’t pinned down: the > 1536-patch regime, where the image is resized to fit the cap before the multiplier. The at-cap point (2048×768 = exactly 1536 patches) is clean at 1.20, but I haven’t characterized larger images precisely — and the 1800×1200 figure above (~2334) doesn’t land neatly on either 1.20 (≈1843) or 1.62 (≈2488), so that resize step may behave differently. Curious if anyone has clean numbers for images well over 1536 patches.
Discussion in the ATmosphere