Raw Record Source

{
  "$type": "site.standard.document",
  "bskyPostRef": {
    "cid": "bafyreieru4ueuyxcjdewaibncjcdyw2yeasf57x6c5u4df6lehvxp23ne4",
    "uri": "at://did:plc:lk3jfj3zq4k4wxnk474axylu/app.bsky.feed.post/3mo3g4xyzhwq2"
  },
  "path": "/t/openai-must-document-the-input-image-pricing-of-gpt-image-2/1382940#post_2",
  "publishedAt": "2026-06-12T09:13:34.000Z",
  "site": "https://community.openai.com",
  "tags": [
    "Gpt-image-2 output pricing calculator - online.. and in your code!",
    "API",
    "(click for more details)"
  ],
  "textContent": "After 100 _“make a cat instead”_ edit requests…\n\n# `gpt-image-2` edits - _Image Input Pricing_\n\n## Overview\n\nI describe image input resizing/patch-ification/tokens and billing for the `gpt-image-2` model on the Images Edits API endpoint.\n\nThe calculation applies per input image. If an edit request contains multiple input images, each image is normalized, reshaped, and billed independently, and the request-level `input_tokens_details.image_tokens` value is the sum of those per-image values.\n\nCode follows.\n\nOutput-side parameters such as `size`, `quality`, `output_format`, `background`, do not change input image-token billing. `n>1` constitutes billing as individual calls (with sometimes 1-token _output_ savings)\n\nSee also, for pricing the _exit_ side in code:\n\nGpt-image-2 output pricing calculator - online.. and in your code! API\n\n> OpenAI made this opaque. I’m pleased to present: Interactive image output token and cost calculator, much easier than OpenAI’s calculator (if you can find it, that likes to say “invalid” after your input). That’s based the algorithm behind cost calculations for gpt-image-2. If you like JavaScript, steal away there. Simple code Bringing this forum around to “stuff for developers” instead of \"complaints from developers, here’s the money shot: Python code for you to compute token output from q…\n\n## Vision input representation\n\n`gpt-image-2` uses a latent image-token representation, “patches”, also. Each input image is converted into a 2D grid of image tokens. One token corresponds to one **16 px x 16 px** region after model-specific image normalization.\n\nThis is different from prior public OpenAI vision billing schemes:\n\n  * It is not the 512 px tile scheme used by GPT-4o/GPT-4.1-style vision models.\n  * It is not the documented 32 px patch scheme used by some newer language/vision models.\n  * It uses a 16 px patch/grid stride.\n  * It has no observed base-token charge.\n  * It has no observed model multiplier.\n  * The reported `image_tokens` value is the final 16 px grid-cell count.\n\n\n\nThe model also constrains the latent input shape. The final latent grid is limited to a maximum aspect ratio of approximately **3:1** (with quant fudges). Inputs wider or taller than this are represented on a larger latent canvas where the short side is padded. These added cells are billed image tokens even though they do not contain additional source-image content.\n\nA 2048 x 512 image, for example, is first normalized to 1024 x 256. That content is a 4:1 rectangle. The latent grid is then expanded to a 64 x 22 token canvas, corresponding to an effective 1024 x 352 canvas. The added vertical area is non-content padding or is still billed despite no seeing, but it is part of the billable latent input.\n\n## Model sizing behavior - _missing row_\n\nModel family | Input detail levels | Patch and resizing behavior\n---|---|---\n`gpt-image-2` edits input images | Fixed; not controlled by output quality | Images are normalized before tokenization. Images with maximum side at or below 512 px are not resized. Images with maximum side greater than 512 px and less than 1024 px are resized so the longest side becomes 512 px. Images with maximum side at or above 1024 px are resized by exactly 1/2. The resized image is then represented as 16 px x 16 px patches. The token grid is padded on the short side if needed so the latent grid aspect ratio is no greater than 3:1. If the resulting grid exceeds 1536 tokens, the effective image/canvas is resized with a 16 px adaptation of OpenAI’s patch-budget fitting algorithm.\n\n## Resizing and reshaping behavior\n\nFor an input image with original dimensions:\n\n\n    width x height\n\n\nthe model performs the following steps.\n\n### 1. Initial content resize\n\nLet:\n\n\n    long_side = max(width, height)\n\n\nThe first resize step is piecewise:\n\n\n    if long_side <= 512:\n        resized_width = width\n        resized_height = height\n\n    elif long_side < 1024:\n        scale = 512 / long_side\n        resized_width = round_half_up(width * scale)\n        resized_height = round_half_up(height * scale)\n\n    else:\n        resized_width = round_half_up(width / 2)\n        resized_height = round_half_up(height / 2)\n\n\nThe rounding mode is positive half-up rounding:\n\n\n    round_half_up(x) = floor(x + 0.5)\n\n\nExamples:\n\n\n    1025 / 2 = 512.5 -> 513\n    1057 / 2 = 528.5 -> 529\n\n\nThis piecewise rule explains the non-monotonic behavior around 512 px and 1024 px. For example:\n\n\n    511 x 512  -> no resize       -> 511 x 512\n    1023 x 512 -> longest to 512  -> 512 x 256\n    1024 x 512 -> half scale      -> 512 x 256\n    1025 x 512 -> half scale      -> 513 x 256\n\n\n### 2. Convert the resized image to a 16 px patch grid\n\nThe resized content is covered by 16 px x 16 px patches:\n\n\n    patch_width = ceil(resized_width / 16)\n    patch_height = ceil(resized_height / 16)\n\n\nA patch can extend beyond the image boundary. This standard patch overhang is billable.\n\nExamples:\n\n\n    256 x 256 -> 16 x 16 -> 256 tokens\n    384 x 384 -> 24 x 24 -> 576 tokens\n    512 x 512 -> 32 x 32 -> 1024 tokens\n\n\n### 3. Apply the 3:1 latent aspect-ratio cap\n\nThe latent token grid is not allowed to exceed a 3:1 aspect ratio, similar to accepted output size parameter.\n\nIf the grid is too wide:\n\n\n    if patch_width > 3 * patch_height:\n        patch_height = ceil(patch_width / 3)\n\n\nIf the grid is too tall:\n\n\n    if patch_height > 3 * patch_width:\n        patch_width = ceil(patch_height / 3)\n\n\nThis step adds non-content padding on the short side.\n\nExamples:\n\n\n    1536 x 512\n    -> initial resize: 768 x 256\n    -> patch grid: 48 x 16\n    -> 48 / 16 = 3, no padding\n    -> tokens: 48 * 16 = 768\n\n\n\n    1537 x 512\n    -> initial resize: 769 x 256\n    -> patch grid: 49 x 16\n    -> 49 / 16 > 3, so short side becomes ceil(49 / 3) = 17\n    -> tokens: 49 * 17 = 833\n\n\n\n    2048 x 512\n    -> initial resize: 1024 x 256\n    -> patch grid: 64 x 16\n    -> 64 / 16 > 3, so short side becomes ceil(64 / 3) = 22\n    -> tokens: 64 * 22 = 1408\n\n\n### 4. Apply the 1536-token per-image budget\n\nAfter the initial resize and aspect-ratio padding, the model has a per-image budget of:\n\n\n    1536 image tokens\n\n\nIf the post-padding grid has at most 1536 tokens, the image-token cost is simply:\n\n\n    patch_width * patch_height\n\n\nIf the post-padding grid exceeds 1536 tokens, the effective image/canvas is resized again while preserving aspect ratio. This final fit uses the same style of integer-safe patch-budget adjustment that OpenAI documents for older patch-based models, but with a 16 px patch size.\n\nLet:\n\n\n    patch_size = 16\n    patch_budget = 1536\n\n\nLet `effective_width` and `effective_height` be the resized image dimensions after initial resizing, with the short side replaced by the explicit padded canvas side if the 3:1 aspect-ratio cap was triggered.\n\nThen:\n\n\n    shrink_factor = sqrt((patch_size^2 * patch_budget) / (effective_width * effective_height))\n\n    adjusted_shrink_factor = shrink_factor * min(\n        floor(effective_width * shrink_factor / patch_size)\n            / (effective_width * shrink_factor / patch_size),\n\n        floor(effective_height * shrink_factor / patch_size)\n            / (effective_height * shrink_factor / patch_size)\n    )\n\n    final_width = floor(effective_width * adjusted_shrink_factor)\n    final_height = floor(effective_height * adjusted_shrink_factor)\n\n    final_patch_width = ceil(final_width / patch_size)\n    final_patch_height = ceil(final_height / patch_size)\n\n    image_tokens = final_patch_width * final_patch_height\n\n\nExamples:\n\n\n    1536 x 1536\n    -> initial resize: 768 x 768\n    -> initial patch grid: 48 x 48 = 2304\n    -> over 1536 budget\n    -> final size: 624 x 624\n    -> final patch grid: 39 x 39\n    -> tokens: 1521\n\n\n\n    2048 x 1024\n    -> initial resize: 1024 x 512\n    -> initial patch grid: 64 x 32 = 2048\n    -> over 1536 budget\n    -> final size: 864 x 432\n    -> final patch grid: 54 x 27\n    -> tokens: 1458\n\n\n\n    3072 x 768\n    -> initial resize: 1536 x 384\n    -> initial patch grid: 96 x 24\n    -> exceeds 3:1, so the effective canvas becomes 96 x 32\n    -> over 1536 budget\n    -> final size: 1056 x 352\n    -> final patch grid: 66 x 22\n    -> tokens: 1452\n\n\n## Constants\n\nConstant | Value | Meaning\n---|---|---\nInitial no-resize maximum side | `512 px` | Images with `max(width, height) <= 512` are not resized before tokenization.\nMid-size normalization target | `512 px` | Images with `512 < max(width, height) < 1024` are resized so the longest side becomes 512 px.\nHalf-scale threshold | `1024 px` | Images with `max(width, height) >= 1024` are resized by exactly 1/2.\nInitial resize rounding | Half-up | Positive halves round upward.\nToken patch size | `16 px x 16 px` | One image token corresponds to one 16 px cell after preprocessing.\nPatch coverage rounding | `ceil` | Patches can extend beyond the image boundary.\nMaximum latent grid aspect ratio | `3:1` | Wider/taller token grids are padded on the short side.\nPer-image patch budget | `1536 tokens` | Images above this budget are resized again to fit.\nBase image tokens | `0` | There is no separate fixed base charge.\nToken multiplier | `1.0` | The reported `image_tokens` value is the final patch count.\nMulti-image aggregation | Additive per image | Each image is processed and billed independently.\n\nCompared with prior OpenAI documentation:\n\nPrior behavior | `gpt-image-2` behavior\n---|---\n512 px tile billing | Not used.\n32 px patch billing | Reused conceptually, but patch size is 16 px.\n1536 patch budget on some prior models | Same numeric budget, but with 16 px cells instead of 32 px cells.\nDetail levels such as `low`, `high`, `auto`, `original` | Not part of this edit-input billing path.\nBase tokens plus tile tokens | Not used.\nGPT Image 1 input-fidelity extras | Not used for this observed `gpt-image-2` image-token calculation.\n\nAlthough `gpt-image-2` appears to use 16px image-token units with a 1536-token cap, its large-image preprocessing halves the input before tokenization on bigger images. As a result, for images with max side at least 1024px, its effective original-image sampling is roughly equivalent to 32px patch tokenization - with the same 1536-token budget as one might find on gpt-5.2 with patches vision. Below that size, notably below 512px, `gpt-image-2` can use a finer token grid than a 32px patch vision model. Therefore, the 16px unit does not imply a four-times-larger usable input; nor does the same 1536-token cap necessarily imply worse vision. It implies a different resize-and-tokenize pipeline tuned for image editing/generation.\nIt is also similar to the 16px step of output.\n\nAlso noteworthy: there is no upscaling for better vision on small images. You can send a 16x16 icon, and be billed for _only one_ patch token, one latent semantic unit to describe it. This may indicate that a client might upsize images themselves. However, another caution I have is not to do “pixels to 8x” or “pixels to 16x”, exactly hitting the granularity. Another conclusion we can make thusly: transparency would be best not hinted at exactly this checkerboard size. I’ve already shown that a patches model can be completely blind to and hallucinate on a pure 32 pixel checkerboard.\n\n## Metacode\n\n\n    function gpt_image_2_image_tokens(width, height):\n        long_side = max(width, height)\n\n        if long_side <= 512:\n            resized = (width, height)\n        else if long_side < 1024:\n            scale = 512 / long_side\n            resized = round_half_up((width * scale, height * scale))\n        else:\n            resized = round_half_up((width / 2, height / 2))\n\n        grid = ceil_each(resized / 16)\n\n        if grid.width > 3 * grid.height:\n            grid.height = ceil(grid.width / 3)\n            effective_size.height = grid.height * 16\n        else if grid.height > 3 * grid.width:\n            grid.width = ceil(grid.height / 3)\n            effective_size.width = grid.width * 16\n\n        effective_grid = ceil_each(effective_size / 16)\n\n        if effective_grid.width * effective_grid.height <= 1536:\n            return effective_grid.width * effective_grid.height\n\n        fitted_size = fit_effective_size_to_1536_patch_budget(\n            effective_size,\n            patch_size = 16,\n            patch_budget = 1536,\n        )\n\n        fitted_grid = ceil_each(fitted_size / 16)\n\n        return fitted_grid.width * fitted_grid.height\n\n\n## Dimension guidance for applications\n\nThe resizing rules mean that upload dimensions and internal dimensions are not the same. In particular, do not interpret an internal destination size as a direct upload-size target without considering the threshold behavior. If you upload that smaller size, the API will normalize that new image again.\n\n### General rules\n\n  1. **Images with maximum side 513 through 1023 are normalized down to a 512 px longest side.**\nThis range is not useful if the goal is to preserve more than 512 px of internal detail.\n\n  2. **Images with maximum side 1024 or greater enter the half-scale branch.**\nA 1248 x 1248 image becomes 624 x 624 internally. A 1536 x 1536 image first becomes 768 x 768, then is budget-fitted back to 624 x 624.\n\n  3. **For square images, the useful upload cap is 1248 x 1248.**\nLarger square images do not produce a higher final square token grid than 39 x 39.\n\n  4. **For 2:1 images, the useful upload cap is approximately 1728 x 864.**\nThis becomes 864 x 432 internally, or a 54 x 27 token grid.\n\n  5. **For 3:1 images, the useful upload cap is approximately 2112 x 704.**\nThis becomes 1056 x 352 internally, or a 66 x 22 token grid.\n\n  6. **For images wider or taller than 3:1, the short side is padded in latent space.**\nThe padding is billed. For these images, extra long-side pixels above roughly 2112 px do not add final latent-grid detail.\n\n  7. **Transpose is symmetric.**\nA `width x height` input and a `height x width` input have the same token cost.\n\n\n\n\n### Common no-waste upload caps\n\nThese are practical caps for preserving the maximum useful internal detail without sending dimensions that are later discarded by the budget fit. This can provide some guidance of simple manual caps before resize you might use to reduce wire transfer, but client code mirroring the same resizing done internally would be better.\n\nAspect ratio | Good upload cap | Internal normalized result | Final token grid | Tokens\n---|---|---|---|---\n`1:1` | `1248 x 1248` | `624 x 624` | `39 x 39` | `1521`\n`4:3` | `1440 x 1080` | `720 x 540` | `45 x 34` | `1530`\n`3:2` | `1536 x 1024` | `768 x 512` | `48 x 32` | `1536`\n`16:9` | `1600 x 900` | `800 x 450` | `50 x 29` | `1450`\n`2:1` | `1728 x 864` | `864 x 432` | `54 x 27` | `1458`\n`3:1` | `2112 x 704` | `1056 x 352` | `66 x 22` | `1452`\nWider than `3:1` | Long side around `2112` | Short side padded to 3:1 latent canvas | Up to about `66 x 22` | Up to about `1452`\n\nA simple application rule is:\n\n\n    If preserving source detail while avoiding wasted upload bytes:\n        choose an aspect-appropriate cap from the table and typical user files\n        resize the source image down to that cap if it exceeds it\n        do not upscale small source images only to manipulate billing\n        there is unexplored potential to upscale very small images for better vision\n\n\nIf the application needs one universal conservative cap, use:\n\n\n    maximum uploaded long side: 2048 px\n\n\nIf the application can use aspect-aware caps, prefer the table above. It avoids oversending square and near-square images.\n\n* * *\n\n# Python reference implementation\n\nThe first public function returns the model’s effective normalized input canvas size for one image. For ordinary images this is the resized image size. For images that exceed the 3:1 latent aspect-ratio limit, the returned size includes the explicit padded short side. It does not add the final implicit patch overhang from `ceil(width / 16)`.\n\nFor client-side preprocessing, to provide a minimum match to what the internals of the API would do, fit the source content into this returned canvas while preserving aspect ratio, and pad any unused area. Transparency is probably good, or simply reject or pass through large when over 3:1 or 1:3 - since we don’t know the exact input. Do not stretch the content unless changing the aspect ratio is intended.\n\nThe second public function returns the per-image input image-token count. For multiple images, call it for each image and sum the unique results. Dollars at the $8.00 per-Mtoken image input pricing.\n\nGlobals are model truth not directly adaptable to other models.\n\n* * *\n\n\n    _GPT_IMAGE_2_PATCH_SIZE = 16\n    _GPT_IMAGE_2_PATCH_BUDGET = 1536\n    _GPT_IMAGE_2_NO_RESIZE_MAX_SIDE = 512\n    _GPT_IMAGE_2_HALF_SCALE_MIN_SIDE = 1024\n    _GPT_IMAGE_2_MAX_GRID_ASPECT = 3\n\n\n    def _ceil_div(value: int, divisor: int) -> int:\n        return -(-value // divisor)\n\n\n    def _round_half_up_div(numerator: int, denominator: int) -> int:\n        return (2 * numerator + denominator) // (2 * denominator)\n\n\n    def _fit_gpt_image_2_patch_budget(width: int, height: int) -> tuple[int, int]:\n        import math\n\n        patch = _GPT_IMAGE_2_PATCH_SIZE\n        budget = _GPT_IMAGE_2_PATCH_BUDGET\n\n        width_cells = math.isqrt((budget * width) // height)\n        height_cells = math.isqrt((budget * height) // width)\n\n        width_scale_num = width_cells * patch\n        width_scale_den = width\n        height_scale_num = height_cells * patch\n        height_scale_den = height\n\n        # Compare rational scales without floating point.\n        if width_scale_num * height_scale_den <= (\n            height_scale_num * width_scale_den\n        ):\n            fitted_width = width_scale_num\n            fitted_height = height * width_scale_num // width_scale_den\n        else:\n            fitted_width = width * height_scale_num // height_scale_den\n            fitted_height = height_scale_num\n\n        return fitted_width, fitted_height\n\n\n    def gpt_image_2_input_destination_size(\n        input_dimensions: tuple[int, int],\n    ) -> tuple[int, int]:\n        \"\"\"Return the normalized gpt-image-2 input canvas for one image.\"\"\"\n        width, height = input_dimensions\n        long_side = max(width, height)\n\n        if long_side <= _GPT_IMAGE_2_NO_RESIZE_MAX_SIDE:\n            resized_width = width\n            resized_height = height\n        elif long_side < _GPT_IMAGE_2_HALF_SCALE_MIN_SIDE:\n            resized_width = _round_half_up_div(\n                width * _GPT_IMAGE_2_NO_RESIZE_MAX_SIDE,\n                long_side,\n            )\n            resized_height = _round_half_up_div(\n                height * _GPT_IMAGE_2_NO_RESIZE_MAX_SIDE,\n                long_side,\n            )\n        else:\n            resized_width = (width + 1) // 2\n            resized_height = (height + 1) // 2\n\n        patch = _GPT_IMAGE_2_PATCH_SIZE\n        max_aspect = _GPT_IMAGE_2_MAX_GRID_ASPECT\n\n        grid_width = _ceil_div(resized_width, patch)\n        grid_height = _ceil_div(resized_height, patch)\n\n        effective_width = resized_width\n        effective_height = resized_height\n\n        # Enforce the 3:1 latent grid limit by padding the short side.\n        if grid_width > max_aspect * grid_height:\n            effective_height = _ceil_div(grid_width, max_aspect) * patch\n        elif grid_height > max_aspect * grid_width:\n            effective_width = _ceil_div(grid_height, max_aspect) * patch\n\n        grid_width = _ceil_div(effective_width, patch)\n        grid_height = _ceil_div(effective_height, patch)\n\n        if grid_width * grid_height <= _GPT_IMAGE_2_PATCH_BUDGET:\n            return effective_width, effective_height\n\n        return _fit_gpt_image_2_patch_budget(effective_width, effective_height)\n\n\n    def gpt_image_2_input_tokens(input_dimensions: tuple[int, int]) -> int:\n        \"\"\"Return gpt-image-2 input image tokens for one image.\"\"\"\n        width, height = gpt_image_2_input_destination_size(input_dimensions)\n\n        return (\n            _ceil_div(width, _GPT_IMAGE_2_PATCH_SIZE)\n            * _ceil_div(height, _GPT_IMAGE_2_PATCH_SIZE)\n        )\n\n\n* * *\n\n## Reference outputs\n\nThese are useful sanity checks for the implementation that you may port, Python asserts I’ve already done for you on “usage”.\n\n\n    assert gpt_image_2_input_tokens((256, 256)) == 256\n    assert gpt_image_2_input_tokens((384, 384)) == 576\n    assert gpt_image_2_input_tokens((768, 768)) == 1024\n    assert gpt_image_2_input_tokens((1024, 1024)) == 1024\n    assert gpt_image_2_input_tokens((1536, 1536)) == 1521\n\n    assert gpt_image_2_input_tokens((512, 256)) == 512\n    assert gpt_image_2_input_tokens((1536, 768)) == 1152\n    assert gpt_image_2_input_tokens((2048, 1024)) == 1458\n    assert gpt_image_2_input_tokens((4096, 2048)) == 1458\n\n    assert gpt_image_2_input_tokens((1537, 512)) == 833\n    assert gpt_image_2_input_tokens((2048, 512)) == 1408\n    assert gpt_image_2_input_tokens((2049, 512)) == 1430\n\n    assert gpt_image_2_input_destination_size((2048, 512)) == (1024, 352)\n    assert gpt_image_2_input_destination_size((2048, 1024)) == (864, 432)\n    assert gpt_image_2_input_destination_size((1536, 1536)) == (624, 624)\n    assert gpt_image_2_input_destination_size((3072, 768)) == (1056, 352)\n\n\nHave some more (click for more details)\n\nFor a request containing multiple input images, you can figure out a loop:\n\n\n    image_dimensions = [\n        (2048, 1024),\n        (1536, 1536),\n        (512, 256),\n    ]\n\n    total_image_tokens = sum(\n        gpt_image_2_input_tokens(dimensions)\n        for dimensions in image_dimensions\n    )\n\n    assert total_image_tokens == 1458 + 1521 + 512\n\n\nNow I’ve got apps to tune up for this “vision” pricing.\nHappy calculating!",
  "title": "OpenAI *must* document the input image pricing of gpt-image-2"
}