Structured output generating meaningless data
Hey Rainer,
I think what’s happening is that Structured Outputs is making the model satisfy the schema shape, but it doesn’t by itself prove the values were actually present in the input.
If email is a required string, the model still has to put some string there. With meaningless input, that can lead to invented/default-looking values. I’d model the “nothing useful to extract” case directly, e.g. add a required status like extracted | insufficient_info, and make extracted fields nullable:
"email"``:`` ``{`` ``"type"``:`` ``[``"string"``,`` ``"null"``],`` ``"format"``:`` ``"email"`` ``}
Then tell the model to return status: "insufficient_info" and null fields when the input isn’t an invoice or doesn’t contain the field.
I’d still validate emails client-side before using them. The docs/help article are useful context here: Structured Outputs makes output match the schema, but it’s not a source-grounding guarantee.
https://help.openai.com/en/articles/8555517-function-calling-in-the-openai-api
Discussion in the ATmosphere