External Publication

Regression: Double responses in apps

OpenAI Developer Community May 21, 2026

Adding corroborating evidence — we see the same thing on gpt-5.5 (thinking) at the raw API level (Responses API), and it’s consistent enough that we could A/B it.

It’s the model generating the second copy, not a client/render bug. On our first (“opening”) turn, gpt-5.5 returns the assistant’s reply concatenated with a near-verbatim second copy in a single output. Token billing confirms it — clean openings bill ~77–93 output tokens, doubled ones ~130–258.

It’s first-turn specific. Almost always the initiating turn (one short user trigger, no prior history); mid-conversation turns rarely double. ~12% of openings in production, up to ~87% for certain prompts.

It’s gpt-5.5-specific. Identical prompt + code path, 30 runs each:

Model	Baseline doubled	With mitigation
gpt-5.5 (thinking)	20/30 exact + 6 partial (~87%)	0 exact, 3 partial
gpt-4.1 (non-thinking)	0/30	0/30
claude-sonnet-4-6 (other vendor)	0/30	0/30

Prompt-side mitigations — gpt-5.5, 15 runs each. A system-prompt “don’t repeat” rule does nothing (and was actually worse); a trigger-message “say it once” instruction is the effective lever, but leaves residual partial repeats a dedupe guard can’t safely catch:

Variant	Exact-doubled	Partial repeat	Clean
baseline	8/15 (53%)	2	5
trigger: “write opening once, then stop”	1/15	5	9
system rule: “output reply exactly once”	10/15	1	4
system rule + trigger	0/15	3	12

Reasoning effort — gpt-5.5, 30 runs each. Lowering effort tracks the doubling rate but never cures it, and minimal returns no output at all (AI_NoOutputGeneratedError) through the Responses path:

Effort	Doubled	Partial	Avg reasoning tokens
default	18/30 (60%)	5	39
medium	20/30 (67%)	5	—
low	9/30 (30%)	5	3
minimal	all 30 errored (no output)	—	—

Happy to share a sanitized repro payload. +1 on prioritizing this — the partial-repeat variant in particular can’t be safely caught by a dedupe guard.

Discussion in the ATmosphere