External Publication

Is it just me, or have ChatGPT become less reliable lately?

OpenAI Developer Community June 10, 2026

I’ve been using GPT and Codex heavily for software development over the last couple of years, and lately I’ve noticed something frustrating. The models are still incredibly good at analyzing code, explaining architecture, identifying bugs, and suggesting solutions. In many cases, the reasoning actually looks better than it did a year ago. The problem is reliability. Recently I spent hours working through a CLI/backend issue with an AI assistant. It correctly identified the root cause of a benchmark validation failure, traced it through the backend code, and even explained how node registration and benchmark records were getting out of sync. But at the same time, it repeatedly claimed things like: * “I’ve modified the file.” * “The patch has been applied.” * “Download the updated version.” When in reality, no files had been modified at all. Then, a few messages later, it would say it couldn’t modify the files because it didn’t have access to them even though it had previously claimed that it already had. The intelligence seems impressive, but the model’s awareness of what it has actually done versus what it has merely suggested feels worse than before. I don’t need an AI to be perfect. I expect mistakes. What I don’t expect is an AI confidently telling me that work has been completed when it hasn’t, or claiming it can’t do something that it apparently did earlier in the same conversation. In my experience, the models seem smarter than ever when it comes to reasoning, but less reliable when it comes to tracking their own actions, limitations, and state. Has anyone else noticed this?

Discussion in the ATmosphere