External Publication
Visit Post

Batch API gets stuck `in_progress` how to debugg?

OpenAI Developer Community June 1, 2026
Source

After changing a range of code files, prompts, and QA rules in my local pipeline, this Batch API issue started appearing. I’m having trouble narrowing down whether it is caused by my request structure, Batch queue behavior, prompt size, structured outputs, prompt caching fields, or something else.

I used ChatGPT to help summarize the debugging evidence, but the issue is from my own local API workflow.

Title: Batch API validates and enters in_progress, but multi-request batch stays 0/N; one-request batch works

Hi, I’m trying to troubleshoot an OpenAI Batch API issue without sharing private product text.

Setup:

  • Endpoint: /v1/chat/completions

  • Model: gpt-5.4-mini

  • Structured output: response_format: json_schema, strict: true

  • reasoning_effort: low

  • max_completion_tokens: 1600

  • Request body also includes prompt_cache_key and prompt_cache_retention: "24h"

What works:

  • A direct synchronous request using the exact same request body works in ~7 seconds.

  • A one-request Batch using the exact same JSONL line completes successfully.

What does not work:

  • A 15-request Batch validates and moves to in_progress, but stays at 0/15 completed for over an hour.

  • No error_file_id.

  • No failed requests.

  • No validation error.

  • If I cancel it, the error file only contains batch_cancelled, so it does not reveal a request-level issue.

Batch/request size:

  • 15 requests in JSONL

  • Total JSONL size: ~292 KB

  • Per-request JSONL line size: ~15.8k–21.8k characters

  • Average line size: ~19.4k characters

  • Direct-one measured prompt tokens: 4,328

  • Direct-one output tokens: around 500–750

  • Rough total prompt tokens for 15 requests: ~65k

Account/model limits shown in Platform UI:

  • gpt-5.4-mini: 4,000,000 TPM, 5,000 RPM, 40,000,000 TPD

  • Budget is not near exhausted

  • The UI has a “Batch queue limits” column, but I do not see a clear copied value for this model

Questions:

  1. Can a multi-request Batch stay at in_progress 0/N because of Batch queue/enqueued-token behavior, even when a one-request Batch works?

  2. Are prompt_cache_key and prompt_cache_retention supported/safe inside Batch request bodies for /v1/chat/completions?

  3. Is there any way to get deeper diagnostics for a Batch that validates but never starts completing requests?

  4. Would the recommended next test be:

    • remove prompt cache fields,

    • split into smaller batches,

    • reduce prompt/request size,

    • reduce max_completion_tokens,

    • or switch to /v1/responses?

  5. Are there known cases where strict structured outputs can cause hidden retries/stalls without an error_file_id?

I can share a fully redacted JSONL line structure if needed, but not the private product text.

Discussion in the ATmosphere

Loading comments...