{
"$type": "site.standard.document",
"bskyPostRef": {
"cid": "bafyreieqjn5ri56vofgbbvdnit2q2jjfwddwguja6ey5onkh6bpjeojlsi",
"uri": "at://did:plc:lk3jfj3zq4k4wxnk474axylu/app.bsky.feed.post/3mjsshk4cqsb2"
},
"path": "/t/chatgpt-5-4-pro-standard-mode-adaptive-thinking-or-nerfing-model/1379265#post_1",
"publishedAt": "2026-04-19T00:23:44.000Z",
"site": "https://community.openai.com",
"textContent": "Hi everyone,\n\nI’m trying to determine whether other users are seeing a similar behavior change with **GPT-5.4 Pro Standard** on long-context, high-effort tasks.\n\nI’m **not claiming a confirmed backend bug**. I’m looking for comparison data because the change I observed is large enough that it does not look normal\n\n### What I tested\n\nI have a repeatable long-context task that requires the model to:\n\n 1. read a large uploaded context/file packet,\n\n 2. reconcile multiple source documents,\n\n 3. identify pending work,\n\n 4. produce a concrete written deliverable,\n\n 5. include an actionable implementation/review plan.\n\n\n\n\nThis is not a short Q&A prompt. It is the kind of task where the model needs sustained reasoning and careful file/context handling.\n\n### What I observed (using the same task as to have imperial test diagnostic data)\n\nA prior run of the same class of task, using **GPT-5.4 Pro Standard** , took roughly **60 minutes** and completed the work correctly.\n\nA later run, also using **GPT-5.4 Pro Standard** , completed in roughly **8 minutes** , but the output was materially lower quality. It looked more like a readiness/summary response than the actual requested deliverable. Same task and files, it just change from a day to the next.\n\nThe issue was not simply that the model was faster. The issue was:\n\n\n GPT-5.4 Pro Standard run A: ~60 minutes, complete and correct\n GPT-5.4 Pro Standard run B: ~8 minutes, incomplete and missing the core deliverable\n\n\n\n### Why this seems concerning\n\nFor this task type, a correct answer required the model to stay engaged across a large context and produce a concrete output. Instead, the shorter run appeared to stop at a high-level framing/acknowledgement stage.\n\nThe shorter run did not just compress the work. It skipped the central artifact the task required.\n\nThis resembles a lower effective reasoning-effort budget, but I cannot see the hidden backend setting, so I do not know whether the cause is:\n\n\n a temporary routing/configuration issue,\n a hidden reasoning-effort change,\n file/context handling degradation,\n early stopping behavior,\n or normal model variance.\n\n\n\n### Why I do not think this is just normal variation\n\nA swing from about **60 minutes** to about **8 minutes** for the same class of long-context task is large by itself.\n\nBut the stronger signal is output completeness:\n\n\n Earlier run: long duration, complete deliverable\n Later run: short duration, plausible-looking summary, missing deliverable\n\n\n\nThe later answer looked superficially responsive, but it did not complete the actual work requested.\n\nThis is a repeated pattern I’ve noticed before when a new model was released and one stay using the same “old” or not current latest model, so maybe is the case since this happen on a Saturday Apr 18th, that a new model might come out or something, but not some I can know.\n\n### Secondary tool/context anomalies\n\nI also noticed some possible tool/context weirdness during diagnostics, though these may be separate issues:\n\n * uploaded file retrieval seemed inconsistent;\n\n * search over uploaded/context files appeared to surface unrelated prior material;\n\n * a simple Python/stdout test behaved inconsistently in one diagnostic path, while a direct Python path worked.\n\n\n\n\nAgain, those may be unrelated, but I’m mentioning them in case others are seeing similar clusters.\n\n### Questions for other users\n\nHas anyone else recently seen GPT-5.4 Pro Standard:\n\n * finish long reasoning tasks much faster than before;\n\n * produce a plausible-looking summary instead of the requested artifact;\n\n * appear to use a lower effective thinking budget;\n\n * skip file/artifact production in tasks where prior runs completed it;\n\n * behave differently across otherwise similar Standard-mode sessions?\n\n\n\n\nUseful comparison data would be:\n\n\n same or similar prompt\n same uploaded/context size\n model setting used\n earlier run duration and quality\n later run duration and quality\n whether the final deliverable was actually produced\n whether the run seemed to stop at summary/readiness instead of execution\n\n\n\nI’m trying to determine whether this is expected variance, a temporary configuration/routing issue, a file/context handling issue, or a broader regression in effective long-context reasoning within GPT-5.4 Pro Standard.\n\nIt all seems faster and likely users will say or noticed that ChatGPT is alot faster, like 4x or more, before it was slower to go from a prompt sent to Thinking and in the Thinking tab the steps usually would take longer now it all goes much much quicker similar in a way to a real-time chat. Just want to know if this is the new normal so I can see what and how to engineer around it or alternatives.",
"title": "ChatGPT 5.4 Pro Standard Mode - Adaptive Thinking or Nerfing Model?"
}