GPT-5.5 Behavior Feedback: Lower Hallucinations, but Reduced Reasoning Depth and Usability
Hello OpenAI team,
I would like to share structured feedback regarding recent model behavior changes, specifically GPT-5.5 compared to earlier versions such as GPT-5.4.
Overall, GPT-5.5 represents a meaningful improvement in safety and a noticeable reduction in hallucinations, which is highly valuable and appreciated. However, in practical usage, I’ve observed a trade-off that affects usability in certain workflows.
In particular:
- Responses are often more concise, but sometimes at the cost of necessary depth and detail for complex tasks
- Instruction-following appears less consistent in multi-step or highly constrained prompts
- In some cases, the model seems to prematurely simplify or avoid deeper reasoning, which reduces effectiveness for analytical, technical, or planning-heavy use cases
My concern is not with the safety improvements themselves, but with maintaining a balanced optimization between:
accuracy, reasoning depth, instruction adherence, and practical usefulness
From a user perspective, GPT-5.4 felt more reliable in extended reasoning and structured outputs, while GPT-5.5 feels safer and more controlled, but occasionally less thorough in execution.
Ideally, future iterations could preserve the stronger reasoning depth and instruction fidelity seen in earlier versions, while maintaining the improved safety and factual reliability introduced in GPT-5.5.
I believe this balance is critical for developers and power users who rely on the model for structured, multi-step, and precision-sensitive tasks.
Thank you for your work and for considering this feedback.
Discussion in the ATmosphere