Recent Regression: 5.4 Thinking Model reopens already-proved points without identifying a concrete gap
I am reporting a behavior regression that has become very noticeable for me over the last few months
In January and February, I was getting substantially better performance on evidence-sensitive tasks. The model was more capable of moving from verified proof to conclusion. More recently, I have been seeing a repeated pattern where the model forces unnecessary extra work even after the threshold for conclusion has already been met.
Observed behavior
When I provide a curated .md file containing:
relevant facts
citations
quotations
supporting legal authority
and I have already manually reviewed that material for accuracy, the model often still does one or more of the following:
reopens already-settled points
injects generic uncertainty without naming a specific defect
asks for more proof without identifying a missing element, conflicting fact, or contrary authority
generates speculative justifications not grounded in the supplied material
stays in “prove it again” mode instead of moving to consequence, defenses, remedy, or unresolved issues
The practical effect is miscalibrated skepticism after verification.
Expected behavior
If the user provides a verified source package, especially a checked .md file containing citations and quotations, the model should treat that package as the operative working source for the session.
At that point, it should only request more support if it can identify a concrete problem such as:
a specific missing element
a conflicting record fact
a contrary authority
a stale-authority issue
a quote/citation mismatch
If none of those exist, the model should proceed to:
what the proof establishes
the legal consequence
any actually available defenses on the present record
remedy analysis
genuinely unresolved issues only
Why this seems like a regression
This behavior was less pronounced for me in December and January. The model was more willing to conclude once the burden had been met.
Now it more often seems to relitigate already-proved points without identifying a concrete gap. That adds friction and shifts the burden back to the user to re-prove what is already proved.
Repro pattern
Upload or paste a curated
.mdfile with facts, citations, and quotations.Ask a targeted question where the relevant rule text and supporting proof are already present.
The model responds by asking for more support or restating uncertainty in general terms.
No concrete missing element, conflicting fact, or contrary authority is identified.
The user has to spend additional turns pushing the model from proof to conclusion.
Actual impact
This is not just a style issue. It is a workflow defect.
For serious users who already manually verify citations and quoted authorities, the failure mode is not “the model made one mistake.” The failure mode is that the model adds an ongoing attention and time cost by refusing to move cleanly from verified proof to conclusion.
That is especially harmful in legal and evidence-sensitive work, where the user may already be doing careful source review before asking the model to analyze the result.
Concise defect label
MIS-CALIBRATED SKEPTICISM: model demanded re-proof after burden satisfied without identifying a concrete gap.
Request
Please review whether recent model behavior changes have made ChatGPT more likely to:
reopen resolved points after verification
apply generic skepticism instead of source-specific analysis
require extra user effort even when the relevant burden has already been met
The main issue is not that the model is cautious. The issue is that it sometimes no longer distinguishes well between:
unresolved uncertainty
and already-proved points supported by the supplied record
That is the regression I am reporting.
Discussion in the ATmosphere