External Publication
Visit Post

Why are completion tokens so high?

OpenAI Developer Community February 28, 2026
Source
  1. gpt-5 models are reasoning models. You pay for their internal thinking as output.
  2. gpt-5-nano thinks excessively long for poor results. Better to just use mini.
  3. use the API parameter “reasoning_effort”, and set it to “low”. That will indicate to the model how much to think (the parameter for Chat Completions).
  4. Or simply use gpt-4.1, which goes right to producing output without first deliberating about and valuing which “code” to generate. Use a “top_p”: 0.01 if you want consistent answers instead of random ones.

Discussion in the ATmosphere

Loading comments...