Batch API with o3-deep-research spawning duplicates
Looking for help understanding what we’re seeing on the Batch API with o3-deep-research-2025-06-26.
What we submitted
One day of legitimate state-research work across 7 different state slugs
Each state: 4 separate POST /v1/batches, each batch containing exactly 1 line targeting /v1/responses
Total legitimate request lines submitted across the whole day: 28 (one slug had only 3, so 27)
Our server logs the route invocation each time and confirms 28 distinct submissions, no client-side retries
What we observed in the OpenAI dashboard
For the most recently-affected slug alone, the Logs → Responses view shows 629 entries today, all with the JSONL body our 4 batches for that slug submitted (same prompt text, same max_output_tokens: 100000, same
max_tool_calls: 30, same tool config, background: false, metadata: {}). About 80+ of them are full completions with returned content (the rest are from 429s after we hit the per-model TPD cap).
So 4 batch lines became ~80 billed Deep Research completions, plus several hundred additional /v1/responses log entries that didn’t return content. The same pattern played out earlier in the day across the other 6 state
slugs.
Today’s spend on the model: $47.12 — consistent with dozens of completed Deep Research runs, not 28.
What we have ruled out from our side
- We have exactly two /v1/responses submission paths in our code: a direct background: true call and the batch JSONL above. Every one of the mystery resp_… objects has background: false, so the direct path is not the
source.
GET /v1/batches?limit=100 returns 27 batches for the day total. No extra batches on the affected slug. No older Wyoming batches. So fan-out is not “duplicate submissions we forgot about”.
Server-side logs show exactly one route invocation per slug for the day. No client-side retry loop.
Webhooks and crons in the codebase are read-only against OpenAI.
Other oddities
Each batch’s request_counts reads {total: 1, completed: 0, failed: 0}, despite the dashboard showing many /v1/responses executions associated with that batch’s JSONL body. So the counters don’t reflect what actually ran.
output_file_id and error_file_id are both null on all the batches in question, so the work isn’t surfacing through normal batch output channels even though we’re being billed for it.
4 batches for one slug have been stuck in cancelling for 11+ hours after POST /v1/batches/{id}/cancel returned HTTP 200. They never transitioned to cancelled. The other 23 batches show status completed despite having no
output file.
- The spawned resp_… objects carry no link back to a parent batch — metadata: {} on every one of them — so there’s no way from a response in the logs to figure out which batch produced it.
Questions for the community / OpenAI staff
Under what circumstances does a single batch line with total: 1 produce many /v1/responses executions? Is there an internal retry / fan-out policy on the batch worker?
Why do batch request_counts not reflect the actual number of /v1/responses executions performed for that batch?
Is it expected that a completed batch with no output_file_id still ran billable executions whose outputs were not delivered through the batch API?
Should per-line /v1/responses objects inherit the batch envelope’s metadata so they’re attributable? Right now the response object stands alone with no parent reference, which makes incidents like this very hard to
diagnose.
Discussion in the ATmosphere