External Publication
Visit Post

ChatGPT 5.4 Pro Standard Mode - Adaptive Thinking or Nerfing Model?

OpenAI Developer Community April 19, 2026
Source

Hi everyone,

I’m trying to determine whether other users are seeing a similar behavior change with GPT-5.4 Pro Standard on long-context, high-effort tasks.

I’m not claiming a confirmed backend bug. I’m looking for comparison data because the change I observed is large enough that it does not look normal

What I tested

I have a repeatable long-context task that requires the model to:

  1. read a large uploaded context/file packet,

  2. reconcile multiple source documents,

  3. identify pending work,

  4. produce a concrete written deliverable,

  5. include an actionable implementation/review plan.

This is not a short Q&A prompt. It is the kind of task where the model needs sustained reasoning and careful file/context handling.

What I observed (using the same task as to have imperial test diagnostic data)

A prior run of the same class of task, using GPT-5.4 Pro Standard , took roughly 60 minutes and completed the work correctly.

A later run, also using GPT-5.4 Pro Standard , completed in roughly 8 minutes , but the output was materially lower quality. It looked more like a readiness/summary response than the actual requested deliverable. Same task and files, it just change from a day to the next.

The issue was not simply that the model was faster. The issue was:

GPT-5.4 Pro Standard run A: ~60 minutes, complete and correct
GPT-5.4 Pro Standard run B: ~8 minutes, incomplete and missing the core deliverable

Why this seems concerning

For this task type, a correct answer required the model to stay engaged across a large context and produce a concrete output. Instead, the shorter run appeared to stop at a high-level framing/acknowledgement stage.

The shorter run did not just compress the work. It skipped the central artifact the task required.

This resembles a lower effective reasoning-effort budget, but I cannot see the hidden backend setting, so I do not know whether the cause is:

a temporary routing/configuration issue,
a hidden reasoning-effort change,
file/context handling degradation,
early stopping behavior,
or normal model variance.

Why I do not think this is just normal variation

A swing from about 60 minutes to about 8 minutes for the same class of long-context task is large by itself.

But the stronger signal is output completeness:

Earlier run: long duration, complete deliverable
Later run: short duration, plausible-looking summary, missing deliverable

The later answer looked superficially responsive, but it did not complete the actual work requested.

This is a repeated pattern I’ve noticed before when a new model was released and one stay using the same “old” or not current latest model, so maybe is the case since this happen on a Saturday Apr 18th, that a new model might come out or something, but not some I can know.

Secondary tool/context anomalies

I also noticed some possible tool/context weirdness during diagnostics, though these may be separate issues:

  • uploaded file retrieval seemed inconsistent;

  • search over uploaded/context files appeared to surface unrelated prior material;

  • a simple Python/stdout test behaved inconsistently in one diagnostic path, while a direct Python path worked.

Again, those may be unrelated, but I’m mentioning them in case others are seeing similar clusters.

Questions for other users

Has anyone else recently seen GPT-5.4 Pro Standard:

  • finish long reasoning tasks much faster than before;

  • produce a plausible-looking summary instead of the requested artifact;

  • appear to use a lower effective thinking budget;

  • skip file/artifact production in tasks where prior runs completed it;

  • behave differently across otherwise similar Standard-mode sessions?

Useful comparison data would be:

same or similar prompt
same uploaded/context size
model setting used
earlier run duration and quality
later run duration and quality
whether the final deliverable was actually produced
whether the run seemed to stop at summary/readiness instead of execution

I’m trying to determine whether this is expected variance, a temporary configuration/routing issue, a file/context handling issue, or a broader regression in effective long-context reasoning within GPT-5.4 Pro Standard.

It all seems faster and likely users will say or noticed that ChatGPT is alot faster, like 4x or more, before it was slower to go from a prompt sent to Thinking and in the Thinking tab the steps usually would take longer now it all goes much much quicker similar in a way to a real-time chat. Just want to know if this is the new normal so I can see what and how to engineer around it or alternatives.

Discussion in the ATmosphere

Loading comments...