External Publication
Visit Post

New feature: moderation scores in Chat API responses, by parameter

OpenAI Developer Community June 21, 2026
Source

developers.openai.com

Moderate generated content - Moderation | OpenAI API

Use OpenAI moderation models to detect harmful content in text and images. You can classify standalone inputs with the moderation endpoint or request moderation scores alongside a generated response. Use the results to enforce your application’s...

  1. Send:
`"moderation":{"model": "omni-moderation-latest"}`
  1. Receive the moderation endpoint’s classification object says the documentation:
response.moderation.input
response.moderation.output

However, that is wrong documentation for the RESTful API itself.

output → moderation → output is where you’ll find the object

Or “moderation” in event "type": "response.completed"

See that you ran flagged input. See that the model was flagged for non-refusal.


Concern

This only provides inclusion of a score in a response.

It doesn’t optionally prevent the input from being run.

It only protects you from OpenAI generations. It doesn’t protect you from OpenAI.

You API organization is still in jeopardy with use of unclassified unfiltered user content - and in jeopardy anyway from lots of things unclassified by moderation and undocumented until you are banned and credits taken: distillation, cyber research, biological, etc.

Perhaps a new API method, to instill false confidence and encourage you to get banned?

Discussion in the ATmosphere

Loading comments...