Raw Record Source

{
  "$type": "site.standard.document",
  "bskyPostRef": {
    "cid": "bafyreigxyretty6hr2avm2af7kwhe3cu2uznfaz7w6dzfcy4s3gztvnj2y",
    "uri": "at://did:plc:lk3jfj3zq4k4wxnk474axylu/app.bsky.feed.post/3mj6gjdxd4bk2"
  },
  "path": "/t/recent-regression-5-4-thinking-model-reopens-already-proved-points-without-identifying-a-concrete-gap/1378773#post_1",
  "publishedAt": "2026-04-10T20:14:18.000Z",
  "site": "https://community.openai.com",
  "textContent": "I am reporting a behavior regression that has become very noticeable for me over the last few months\n\nIn January and February, I was getting substantially better performance on evidence-sensitive tasks. The model was more capable of moving from verified proof to conclusion. More recently, I have been seeing a repeated pattern where the model forces unnecessary extra work even after the threshold for conclusion has already been met.\n\nObserved behavior\n\nWhen I provide a curated `.md` file containing:\n\n  * relevant facts\n\n  * citations\n\n  * quotations\n\n  * supporting legal authority\n\n\n\n\nand I have already manually reviewed that material for accuracy, the model often still does one or more of the following:\n\n  * reopens already-settled points\n\n  * injects generic uncertainty without naming a specific defect\n\n  * asks for more proof without identifying a missing element, conflicting fact, or contrary authority\n\n  * generates speculative justifications not grounded in the supplied material\n\n  * stays in “prove it again” mode instead of moving to consequence, defenses, remedy, or unresolved issues\n\n\n\n\nThe practical effect is **miscalibrated skepticism after verification**.\n\n## Expected behavior\n\nIf the user provides a verified source package, especially a checked `.md` file containing citations and quotations, the model should treat that package as the operative working source for the session.\n\nAt that point, it should only request more support if it can identify a concrete problem such as:\n\n  * a specific missing element\n\n  * a conflicting record fact\n\n  * a contrary authority\n\n  * a stale-authority issue\n\n  * a quote/citation mismatch\n\n\n\n\nIf none of those exist, the model should proceed to:\n\n  * what the proof establishes\n\n  * the legal consequence\n\n  * any actually available defenses on the present record\n\n  * remedy analysis\n\n  * genuinely unresolved issues only\n\n\n\n\n## Why this seems like a regression\n\nThis behavior was less pronounced for me in December and January. The model was more willing to conclude once the burden had been met.\n\nNow it more often seems to relitigate already-proved points without identifying a concrete gap. That adds friction and shifts the burden back to the user to re-prove what is already proved.\n\n## Repro pattern\n\n  1. Upload or paste a curated `.md` file with facts, citations, and quotations.\n\n  2. Ask a targeted question where the relevant rule text and supporting proof are already present.\n\n  3. The model responds by asking for more support or restating uncertainty in general terms.\n\n  4. No concrete missing element, conflicting fact, or contrary authority is identified.\n\n  5. The user has to spend additional turns pushing the model from proof to conclusion.\n\n\n\n\n## Actual impact\n\nThis is not just a style issue. It is a workflow defect.\n\nFor serious users who already manually verify citations and quoted authorities, the failure mode is not “the model made one mistake.” The failure mode is that the model adds an ongoing attention and time cost by refusing to move cleanly from verified proof to conclusion.\n\nThat is especially harmful in legal and evidence-sensitive work, where the user may already be doing careful source review before asking the model to analyze the result.\n\n## Concise defect label\n\n**MIS-CALIBRATED SKEPTICISM:** model demanded re-proof after burden satisfied without identifying a concrete gap.\n\n## Request\n\nPlease review whether recent model behavior changes have made ChatGPT more likely to:\n\n  * reopen resolved points after verification\n\n  * apply generic skepticism instead of source-specific analysis\n\n  * require extra user effort even when the relevant burden has already been met\n\n\n\n\nThe main issue is not that the model is cautious. The issue is that it sometimes no longer distinguishes well between:\n\n  * unresolved uncertainty\n\n  * and already-proved points supported by the supplied record\n\n\n\n\nThat is the regression I am reporting.",
  "title": "Recent Regression: 5.4 Thinking Model reopens already-proved points without identifying a concrete gap"
}