External Publication

Recent Regression: 5.4 Thinking Model reopens already-proved points without identifying a concrete gap

OpenAI Developer Community April 10, 2026

I am reporting a behavior regression that has become very noticeable for me over the last few months

In January and February, I was getting substantially better performance on evidence-sensitive tasks. The model was more capable of moving from verified proof to conclusion. More recently, I have been seeing a repeated pattern where the model forces unnecessary extra work even after the threshold for conclusion has already been met.

Observed behavior

When I provide a curated .md file containing:

relevant facts
citations
quotations
supporting legal authority

and I have already manually reviewed that material for accuracy, the model often still does one or more of the following:

reopens already-settled points
injects generic uncertainty without naming a specific defect
asks for more proof without identifying a missing element, conflicting fact, or contrary authority
generates speculative justifications not grounded in the supplied material
stays in “prove it again” mode instead of moving to consequence, defenses, remedy, or unresolved issues

The practical effect is miscalibrated skepticism after verification.

Expected behavior

If the user provides a verified source package, especially a checked .md file containing citations and quotations, the model should treat that package as the operative working source for the session.

At that point, it should only request more support if it can identify a concrete problem such as:

a specific missing element
a conflicting record fact
a contrary authority
a stale-authority issue
a quote/citation mismatch

If none of those exist, the model should proceed to:

what the proof establishes
the legal consequence
any actually available defenses on the present record
remedy analysis
genuinely unresolved issues only

Why this seems like a regression

This behavior was less pronounced for me in December and January. The model was more willing to conclude once the burden had been met.

Now it more often seems to relitigate already-proved points without identifying a concrete gap. That adds friction and shifts the burden back to the user to re-prove what is already proved.

Repro pattern

Upload or paste a curated .md file with facts, citations, and quotations.
Ask a targeted question where the relevant rule text and supporting proof are already present.
The model responds by asking for more support or restating uncertainty in general terms.
No concrete missing element, conflicting fact, or contrary authority is identified.
The user has to spend additional turns pushing the model from proof to conclusion.

Actual impact

This is not just a style issue. It is a workflow defect.

For serious users who already manually verify citations and quoted authorities, the failure mode is not “the model made one mistake.” The failure mode is that the model adds an ongoing attention and time cost by refusing to move cleanly from verified proof to conclusion.

That is especially harmful in legal and evidence-sensitive work, where the user may already be doing careful source review before asking the model to analyze the result.

Concise defect label

MIS-CALIBRATED SKEPTICISM: model demanded re-proof after burden satisfied without identifying a concrete gap.

Request

Please review whether recent model behavior changes have made ChatGPT more likely to:

reopen resolved points after verification
apply generic skepticism instead of source-specific analysis
require extra user effort even when the relevant burden has already been met

The main issue is not that the model is cautious. The issue is that it sometimes no longer distinguishes well between:

unresolved uncertainty
and already-proved points supported by the supplied record

That is the regression I am reporting.