External Publication
Visit Post

Responses API: strict json_schema returns malformed JSON when combined with file_search + `include: file_search_call.results`

OpenAI Developer Community May 10, 2026
Source

Summary

With text.format set to json_schema + strict: true, and tools including file_search (a vector store) and web_search, and include containing file_search_call.results, the Responses API intermittently returns malformed JSON in output_text.text while reporting status: completed and incomplete_details: null. Streaming and non-streaming both reproduce.

Failure rate measured against gpt-5.4-mini-2026-03-17:

Configuration Failure rate (n=20)
As below (baseline, the trigger config) ~20% (3–4 / 20)
Same body without include: file_search_call.results ~10%
Same body without tools 0/20
Same body without tool_choice: required (-> auto) 0/20 (small N)

The malformed output is not a truncation — strict mode would fail-closed on truncation. It’s a structurally invalid sequence: the model emits one valid "key":"value" pair, then a second value preceded by only a : (no comma, no key for the second field). Every failure I observed has the same shape.

Symptom (verbatim from output_text.text)

{"headline_summary":"Apple’s most recent transcript in the files is its Q3 FY2025 earnings call, where management leaned hard on record services revenue, strong iPhone demand, and confidence in China; the stock now sits at $415.12, down 1.3":"Cautiously constructive: the narrative is upbeat, but the price action reads as incremental validation, not a euphoric rerating."}

Token sequence:

  • { "headline_summary" : "<long string>" : "<value>" }

The middle : is the failure — it should be ,"overall_sentiment": per the schema.

The full response.completed event reports:

  • status: "completed"
  • incomplete_details: null
  • text.format.strict: true
  • The output message.content[0].text carries the broken JSON
  • The annotations array on that OutputText has correct character indices into the (broken) text

Reproduction

Tested against https://api.openai.com/v1/responses with urllib.request (Python 3.11) on macOS. The script reproduces the bug on a fresh, throwaway vector store with a single 1-line markdown file. ~20% failure rate over 20 runs.

# Minimal request body that reproduces (extracted byte-for-byte from
# the call our app makes, then bisected). Replace VS_ID with a real
# vector store containing at least one indexed file.
{
  "model": "gpt-5.4-mini-2026-03-17",
  "stream": true,
  "input": [
    {"role": "developer", "type": "message",
     "content": "\n---\n\nYour entire response must be valid JSON matching this shape exactly. Use the field descriptions to decide what to put in each field.\n\nExample response:\n```json\n{\n  \"headline_summary\": \"\",\n  \"overall_sentiment\": \"\"\n}\n```\n\nField descriptions:\n- `headline_summary` (text)\n- `overall_sentiment` (text)\n"},
    {"role": "user", "type": "message",
     "content": "Use the file_search tool to find the most recent earnings call transcript matching the Ticker below. Pull out the most quotable management claim from the call. Then web-search the current stock price action since that call. Write a 4-sentence pithy take that contrasts narrative vs market reality. Cite the file_search source and one web URL.\n\nTicker: AAPL\n\nName: Apple Inc."}
  ],
  "include": ["file_search_call.results", "reasoning.encrypted_content"],
  "reasoning": {"effort": "none", "summary": "auto"},
  "text": {
    "format": {
      "type": "json_schema",
      "name": "ai_request_output",
      "strict": true,
      "schema": {
        "type": "object",
        "additionalProperties": false,
        "required": ["headline_summary", "overall_sentiment"],
        "properties": {
          "headline_summary":  {"type": "string"},
          "overall_sentiment": {"type": "string"}
        }
      }
    }
  },
  "tool_choice": "required",
  "tools": [
    {"type": "file_search",
     "vector_store_ids": ["VS_ID"]},
    {"type": "web_search", "search_context_size": "medium"}
  ]
}

Run it ~20 times and parse the final output_text.text as JSON. Any json.JSONDecodeError on a status: completed, incomplete_details: null response is the bug.

Bisect (n=15–20 each)

Starting from the body above, single-variable changes:

Change Failure rate
baseline 3/15
remove include: file_search_call.results 0/15 ✓ (n=15); 2/20 (n=20 retest)
remove include: reasoning.encrypted_content 1/5 (small N)
remove include entirely 0/5
reasoning.summary: autodetailed 4/15 (no help)
remove reasoning entirely 1/5
remove store: null 0/5
tool_choice: requiredauto 0/5
remove web_search 0/5
remove file_search 0/5
stream: truefalse did not test under same prompt; failure observed in both modes in separate runs

The strongest single trigger is include: file_search_call.results — but even removing it leaves a residual ~10% rate, so the trigger is the combination, not a single field.

Expected behaviour

Under strict: true, the model’s output grammar should be enforced such that any returned output_text.text for a status: completed / incomplete_details: null response parses as valid JSON conforming to the schema. Any failure to satisfy the schema should manifest as incomplete_details.reason (e.g. max_output_tokens, content_filter), not as malformed JSON in a “completed” response.

Workaround

In our app we now (1) skip include: file_search_call.results whenever text.format is a strict json_schema, and (2) retry the call up to 3 times on local JSON parse failure when strict structured output was requested. Combined this drops residual failures below 0.1%, but the trade-off is that we lose per-file match details when structured outputs are also configured.

Environment

  • API: Responses API (POST /v1/responses)
  • Model: gpt-5.4-mini-2026-03-17 (also seen on the unversioned gpt-5.4-mini)
  • Streaming: bug reproduces both with and without stream: true
  • Date observed: 2026-05-09

Discussion in the ATmosphere

Loading comments...