External Publication
Visit Post

Gpt-5.4 ignores reasoning_effort="none" when max_completion_tokens is used

OpenAI Developer Community April 2, 2026
Source

The fault seen here is that Chat Completions does not deliver the output if it is not complete. It is all classified as “reasoning” in usage, even if it is clear the output would have transitioned to the final seen output.

Then, that there actually is reasoning at “none”, just hidden behind a threshold of 128 tokens, before which where it is not billed.

I’ve made posts about this before. Let’s say the AI will write 500 tokens quite predictably. With max_completion_tokens at 300, 400, 500, you get all “reasoning” and never a non-stream output. Increase that more, eventually you get the switch to the full output instead of no output, getting what you paid for only when the AI has reached the stop sequence and the output is done.

This symptom on Chat Completions has continued on reasoning models, with no sign that my reporting of this issue has had any impact. You pay, to then not get the partial output.

GPT-5 on "minimal" - Serious anomaly in prediction token billing and output delivery failure Bugs

Expected: user gets all the output intended for them, up to truncation point by output limit. Issue: When a user-facing response is in any way incomplete, the entire response is undelivered, and all the generation is billed as “reasoning_tokens” Symptoms: The AI might reason much longer by developer message (somewhat expected) or rather, token billing bins are wrong, or, untruthful token counts are delivered that masks this reasoning when a response is complete: “reasoning_tokens” billin…

Discussion in the ATmosphere

Loading comments...