Raw Record Source

{
  "$type": "site.standard.document",
  "bskyPostRef": {
    "cid": "bafyreihugatkugtncy2ovtzcpyaoctf27oqavqxrtoviufgwgawrk7zfle",
    "uri": "at://did:plc:lk3jfj3zq4k4wxnk474axylu/app.bsky.feed.post/3mlxep5rhxcs2"
  },
  "path": "/t/why-do-gpt-5-1-and-gpt-5-4-mini-behave-so-differently-in-production-chatbot-use-cases/1380891#post_8",
  "publishedAt": "2026-05-16T07:21:15.000Z",
  "site": "https://community.openai.com",
  "textContent": "Yeah, in experimental phases for new features on Production I do similar. Start with large model, fine tune the code and prompts until I’m satisfied, then later step down the model via settings and see if I can retain acceptable behaviour until I find unacceptable cases, if any, then step back up.\n\nYou could do this in some kind of staging environment too if your risk tolerance is less, of course.",
  "title": "Why do gpt-5.1 and gpt-5.4-mini behave so differently in production chatbot use cases?"
}