Batch API with o3-deep-research spawning duplicates
Substantial new evidence after rotating the API key. The bug reproduces cleanly on the new key, with no 429s involved, and is much more serious than my original report suggested. The previous “429 retry loop” framing was
wrong. Below is what just happened.
Setup
Old sk-proj-…0xIW2XgA key rotated this afternoon, all spend stopped briefly.
On the new key I submitted exactly one batch to verify the fix: batch_6a0334d99ce481908a3c9cc7e9a4399c, slug wyoming, call 1, 1 line, endpoint: /v1/responses, o3-deep-research-2025-06-26.
Fresh quota window. No 429s. No client retries.
What happened
That single batch line produced 3 separate, successful, billed Deep Research completions in under 30 minutes.
All three have status: completed, error: null, incomplete_details: null. All three have metadata: {} because the batch envelope’s metadata does not propagate into spawned response objects.
The smoking gun is the dispatch timing
Dispatch #2 was created at 14:23:11, 3 minutes 58 seconds before Dispatch #1 completed at 14:27:13. So #2 was not dispatched in response to #1 failing or timing out — #1 was still running successfully.
Dispatch #3 was created at 14:29:11, 1 minute 58 seconds after Dispatch #1 had already completed and presumably reported back. So #3 was not dispatched because the batch worker thought #1 was missing.
This is spontaneous duplication of a successful, in-flight or already-completed batch line.
Billing impact for this one batch
Token cost (50% batch discount applied): $2.75
Tool cost (107 web searches × $0.025, no discount): $2.68
Total billed: ~$5.43
Expected cost with a single dispatch: ~$1.81
~3x overspend on a single batch line
Multiplied across the 27 batches I submitted earlier today (which were also fanning out, then masked by a 429 retry loop on top), this explains today’s full $47.12 spend without any bug in my code.
Cancel behavior is still broken too
Cancel issued at 14:50:42 UTC, req_6d6d39b89ebf4f388d282ba21791081d, returned HTTP 200 status: cancelling.
Four more Arizona batches cancelled in the same minute (req_6e2b9f00c36d4746988bb300533083d2, req_a6de1175c3ed42dca7108a83ddf84c36, req_0cece57e96ac446ab6115df8418941a4, req_4823a5e3065e4261a4d59aa4f3e901be).
All 5 still in cancelling with no cancelled_at more than 20 minutes later. Same stuck-cancel pattern as the morning batches.
Other persistent oddities
The Wyoming batch’s request_counts reads {total: 1, completed: 0, failed: 0} despite 3 fully-billed completions executing under it. Counters are not tracking reality.
output_file_id is null. We were billed ~$5.43 for work whose outputs are not delivered through the documented batch retrieval path. To recover the markdown we have to pull each resp_… individually by ID.
All three resp_… objects show background: false and metadata: {}, with no field linking them back to the parent batch. The connection only exists in OpenAI’s internal logs.
Requests
Engineering escalation : a single batch line producing multiple billed completions, on a clean key with no 429s, is a serious correctness bug. Please escalate to the Batch API team.
Force-cancel these 5 batches: batch_6a0334d99ce481908a3c9cc7e9a4399c, batch_6a033b6ac91c81909999c3a124aafc4b, batch_6a033b6ab20c8190aa2f6b5311d1a0ad, batch_6a033b6abfbc8190bb2803d7e629713c,
batch_6a033b6a12c8819096db4a2036e5e5cd. They are stuck in cancelling.
- Credit today’s o3-deep-research-2025-06-26 spend down to the legitimate single-dispatch cost of the 28 lines submitted. Current spend is $47.12; legitimate cost is at most ~$50 if every line had succeeded once, but in
reality many lines produced no usable output through the batch path, so the appropriate refund covers any execution beyond the first per submitted line.
- Bug to file : the spawned /v1/responses objects should inherit the batch envelope’s metadata. Right now a customer hitting this bug has no way to attribute spawned responses to their batch, which makes triage take hours.
Discussion in the ATmosphere