Raw Record Source

{
  "$type": "site.standard.document",
  "bskyPostRef": {
    "cid": "bafyreihtg3ny2opsce3uzwz7a66y7fbb63qldobkybkjds5lilagwejhnu",
    "uri": "at://did:plc:lk3jfj3zq4k4wxnk474axylu/app.bsky.feed.post/3mout4zgbgtc2"
  },
  "path": "/t/new-feature-moderation-scores-in-chat-api-responses-by-parameter/1384260#post_6",
  "publishedAt": "2026-06-22T10:29:21.000Z",
  "site": "https://community.openai.com",
  "textContent": "Personally I think it’s a great feature. But then I also think that this will not fit all the workflows we might imagine.\n\nCorrect me if I’m wrong, what may lead to ban is what model generates from your prompt, not what you put in. I have a comment moderator plugin running for several years already and this thing receives all crap possible in, but it is constructed so that the only output it generates is a digit which cannot be harmful on its own. So never had issues on that side (several millions runs already).\n\nThen if model generates a shape which might contain text (potentially harmful, anything which is not your predefined constants). Then technically you might be exposed to a ban because you cannot guarantee the output. If on top of that you do nothing about moderation of the input, chances for a band increase drastically.\n\nSo the moderation of the input is almost always a must-have in any application where users may submit any content. I would also recommend you log the moderation runs attached to your own request ID and the user who submitted them (no need to store the content of the message itself unless you have a legal reason), and you use that input ID across all your pipeline to trace both the original moderation result and the operations you did with that content after the moderation classified the text as safe to use.\n\nWhy this log? If you get banned at least you have a sort of a proof that you did everything right and the model generated something bad based on “safe” content submitted by the user (you still need to provide your instructions to clarify your part in there). Doesn’t mean you will get unbanned for that, but at least if this gets serious you have your backup.\n\nNow we often use chained operations where the output of the model is the input for the next operation, so the moderation score provided in the same call response skips you a separate API call to moderation endpoints.\n\nAnd that’s a very great point of having them.",
  "title": "New feature: moderation scores in Chat API responses, by parameter"
}