External Publication

Why does Codex repeat the same mistakes?

OpenAI Developer Community June 13, 2026

Good questions. To clarify, this is not an undocumented or lightly guided project. The codebase is around 900,000+ lines, and the documentation layer is very large as well. There are well over 2,000 documentation/spec/governance/evaluation/design-related files across the project. This includes architecture notes, governance material, capability proposals, evaluation docs, agent guidance, decision logs, work ledgers, schemas, and project maps. Also, the project itself is an AI backend. So the issue is not just that “a model wrote a bad patch.” The system I am working on deals directly with model-backed behavior, agent output handling, governance boundaries, context behavior, response contracts, evaluation surfaces, and runtime reliability. That is why repeated context/constraint-following failures are especially visible and damaging in this project. The repeated failure pattern is specific: I repeatedly instructed Codex not to introduce deterministic response templates, keyword/pattern-triggered auto-reply logic, or fixed post-generation reply rails. The system is supposed to remain model-backed and architecture-driven, not turn into an expensive LLM-wrapped auto-reply bot. The problem is that Codex often acknowledges this constraint correctly in conversation, but later reintroduces the same forbidden architectural pattern under different names: fallback handling, guard blocks, fixed response paths, deterministic blocks, or similar mechanisms. This is not a long-standing problem that naturally appeared just because the project became large or because the documentation grew over time. The project has already been large and heavily documented. What concerns me is that this behavior became much more noticeable recently, roughly over the last 1-1.5 weeks. That is why I am trying to distinguish between several possibilities: 1. a recent Codex/model regression, 2. a Codex pipeline or routing change, 3. long-session/context contamination, 4. documentation retrieval/selection failure, 5. or some conflict between project-level guidance and the specific constraints I keep repeating. I fully agree that large-project harnessing matters, and I am open to improving how the agent context is selected. But in this case, the issue does not look like a beginner project with poor documentation. It looks like repeated semantic drift: Codex appears to understand the constraint verbally, then violates the same architectural boundary again during implementation. Note: Since my English is not very strong, I explain the issue to ChatGPT and it translates my response into English so I can share it with you. That is the reason I am using AI in my replies, and I do not want to hide that.

Discussion in the ATmosphere