External Publication

Why do gpt-5.1 and gpt-5.4-mini behave so differently in production chatbot use cases?

OpenAI Developer Community May 16, 2026

Yeah, in experimental phases for new features on Production I do similar. Start with large model, fine tune the code and prompts until I’m satisfied, then later step down the model via settings and see if I can retain acceptable behaviour until I find unacceptable cases, if any, then step back up. You could do this in some kind of staging environment too if your risk tolerance is less, of course.

Discussion in the ATmosphere