External Publication

Codex and coding agents need stronger scope control, not only better code generation

OpenAI Developer Community June 23, 2026

This is about GPT / Codex / coding agents and how they behave inside real codebases. I think one of the biggest problems with AI coding agents right now is not only that they can make bad code. Bad code can be reviewed. Bugs can be fixed. The more dangerous part is that the agent often does not know when to stop. It does not always understand the difference between “complete the requested task” and “reshape the project because it thinks it found a better way.” I recently had this happen on a personal project I had worked on for almost a year. Thousands of lines of code, lots of old decisions, legacy behavior, compatibility work, edge cases, and parts that were ugly for reasons that were not obvious from one file. I had some boring work that was time-consuming but not conceptually hard, so I gave it to Codex. I was careful with the prompt because I did not want internals touched or redesigned. The task was supposed to be limited. The result still went beyond what I expected. It touched more than it should have, changed things in ways I would not have done, and the project ended up with months of work gone or broken in a way I could not properly recover from. I tried the normal recovery paths and checked what I could, but it did not help. At that point it is not just “the model made a mistake.” The issue is that the tool did not have enough resistance against destructive scope drift. (Read the full event here) This is what worries me with coding agents. They can turn a simple task into a refactor, then the refactor creates problems, then those problems need another layer, then suddenly the original request has become a new architecture nobody asked for. The agent keeps going because continuing looks like progress, but in a real codebase every extra change has cost. Existing code is not just text. It has intent, history, compatibility constraints, and sometimes scars from problems that already happened months ago. A good developer knows how to write code, but a good maintainer also knows when not to write code. Sometimes the correct move is to leave a working ugly section alone. Sometimes the correct move is to make the smallest boring patch possible. Sometimes the correct move is to stop and say the change is risky. I don’t think coding agents are strong enough at that yet. What I would like to see in Codex-style tools is stronger scope discipline as a first-class behavior. Before touching unrelated files or internals, the agent should treat that as a risk boundary. If the prompt asks for a limited change, the agent should prefer limited patches even if it sees a cleaner abstraction. If it is uncertain why code exists, it should not assume the code is bad. If a change starts spreading across the project, it should pause and explain that the scope is expanding instead of silently continuing. This is not about making the model less capable. It is about making it safer to trust inside long-running projects. Bigger context windows and better reasoning help, but they do not fully solve the problem if the agent still behaves like every task should be completed by generating more changes. I still believe these tools can be extremely useful. That is why this is frustrating. When they work, they save hours. When they do not know when to stop, they can destroy trust in one session. For coding agents, “I can do it” is not enough. They need to be able to say, “this part is outside the requested scope,” “this is risky,” “I should not touch this,” or simply, “stop here.” A coding agent should not only optimize code, it should understand the cost of changing code that already survived real production pain.

Discussion in the ATmosphere