Is this a world record? The GPT 5.5 ran for 19 hours
When it comes to autonomous work and getting a lot done with fewer tokens, I think GPT-5.5 should be preferred.
Other than that, even GPT-5.2 can still produce very good results for project development. When I started my project, GPT-5.3 was available, so I used that. When GPT-5.4 came out, I switched to it, and now I’m continuing with GPT-5.5.
In my opinion, the main difference is how much effort you need to put into managing the project, and how much effort the model can carry by itself.
Older models can still produce high-quality work, but you need to guide them more tightly. I don’t think the improvement is only about a huge jump in code quality. It feels more like the supervision cost has gone down.
This is also a psychological thing for us as users. When the iPhone 17 comes out, the iPhone 16 suddenly starts to feel old, even though it is still a very capable device.
Of course, I don’t know how OpenAI trains or improves these models internally, so I don’t want to present guesses as facts. But from the outside, it feels like the quality is increasingly coming from the whole system around the model: post-training, context handling, reasoning behavior, tool use, orchestration, and so on.
For example, in GPT-5.5, it feels like there is stronger context tracking or some kind of better memory/compression behavior. The model seems much better at remembering small project details and continuing with the same architectural logic over long sessions.
So I don’t think this is only about “how many parameters does the model have?” anymore. The pipeline around the LLM seems to matter a lot.
And even with older models, you can get closer to this quality by building strong .md documentation inside your project. Architecture notes, rules, file maps, decisions, and acceptance criteria can make the model behave much smarter because they give it a clearer operating frame.
Discussion in the ATmosphere