{
"$type": "site.standard.document",
"bskyPostRef": {
"cid": "bafyreificqspfssi5wv2c3cmvoz4s4flgrkcvnxvcbonp7i5aovhjtrbhy",
"uri": "at://did:plc:lk3jfj3zq4k4wxnk474axylu/app.bsky.feed.post/3mggah3jkpfn2"
},
"path": "/t/hosted-shell-continuations-require-missing-shell-call-output/1375917#post_1",
"publishedAt": "2026-03-06T20:30:56.000Z",
"site": "https://community.openai.com",
"textContent": "I found a reproducible issue with the Responses API when using the hosted `shell` tool together with `previous_response_id`.\n\nA first response can complete successfully with a hosted `shell` call, but a direct continuation using only `previous_response_id` fails with:\n\n\n Error code: 400 - {'error': {'message': 'No tool output found for shell call call_...', 'type': 'invalid_request_error', 'param': 'input', 'code': None}}\n\n\nThe surprising part is that the first response itself is already marked `completed`, and the assistant message includes the shell result, but the response payload often does not include a `shell_call_output` item.\n\n## Why This Looks Like a Bug\n\nThe API appears to require a `shell_call_output` item for continuation, while also sometimes not returning that item in the first response payload.\n\nThis creates an inconsistent contract:\n\n 1. The hosted `shell` tool executes server-side.\n 2. The first response is `completed`.\n 3. The assistant can describe the shell output in natural language.\n 4. But continuing from that response via `previous_response_id` can fail because the server says the shell output is missing.\n\n\n\n## Environment\n\n * Model: `gpt-5.2`\n * Responses API\n * Hosted tool: `shell`\n * Tested with Python SDK versions:\n * `openai 2.24.0`\n * `openai 2.26.0`\n * Result was the same on both versions for the main repro.\n\n\n\n## Main Reproduction\n\n### Request 1\n\nCreate a response with hosted `shell`:\n\n\n from openai import AsyncOpenAI\n\n client = AsyncOpenAI()\n\n resp1 = await client.responses.create(\n model=\"gpt-5.2\",\n input=\"Use the shell tool once to run: printf first_turn. Then briefly report the output.\",\n tools=[{\"type\": \"shell\", \"environment\": {\"type\": \"container_auto\"}}],\n reasoning={\"effort\": \"medium\", \"summary\": \"detailed\"},\n include=[\"reasoning.encrypted_content\"],\n background=True,\n )\n\n\nPoll until terminal with `client.responses.retrieve(resp1.id)`.\n\n### Observed Response 1 Shape\n\nIn multiple runs, the first completed response looked like this structurally:\n\n\n {\n \"status\": \"completed\",\n \"output\": [\n {\"type\": \"reasoning\"},\n {\"type\": \"shell_call\", \"status\": \"completed\", \"call_id\": \"call_...\"},\n {\"type\": \"reasoning\"},\n {\"type\": \"message\"}\n ]\n }\n\n\nNotably absent:\n\n * no `shell_call_output`\n\n\n\nEven though the assistant message already described the shell result.\n\n### Request 2\n\nNow continue directly from the first response:\n\n\n resp2 = await client.responses.create(\n model=\"gpt-5.2\",\n previous_response_id=resp1.id,\n input=\"Now answer with exactly: second turn worked\",\n tools=[{\"type\": \"shell\", \"environment\": {\"type\": \"container_auto\"}}],\n reasoning={\"effort\": \"medium\", \"summary\": \"detailed\"},\n include=[\"reasoning.encrypted_content\"],\n background=True,\n )\n\n\n### Actual Result\n\nThis fails immediately with:\n\n\n Error code: 400 - {'error': {'message': 'No tool output found for shell call call_...', 'type': 'invalid_request_error', 'param': 'input', 'code': None}}\n\n\n### Expected Result\n\nIf the hosted shell call completed server-side and the first response is already terminal, then either:\n\n 1. `previous_response_id` continuation should work without any extra client-side tool-output replay, or\n 2. the first response should always include the required `shell_call_output` item so the client can replay it deterministically.\n\n\n\n## Verified Workaround\n\nThe continuation works if I manually inject a `shell_call_output` input item in the next request:\n\n\n resp2 = await client.responses.create(\n model=\"gpt-5.2\",\n previous_response_id=resp1.id,\n input=[\n {\n \"type\": \"shell_call_output\",\n \"call_id\": \"call_from_resp1\",\n \"status\": \"completed\",\n \"output\": [\n {\n \"stdout\": \"first_turn\",\n \"stderr\": \"\",\n \"outcome\": {\"type\": \"exit\", \"exit_code\": 0},\n }\n ],\n },\n {\n \"role\": \"user\",\n \"content\": \"Now answer with exactly: second turn worked.\",\n },\n ],\n reasoning={\"effort\": \"medium\", \"summary\": \"detailed\"},\n background=True,\n )\n\n\nThis succeeds.\n\n## Additional Observation: Inconsistent `shell_call_output` Presence\n\nAfter introducing manual `shell_call_output` replay in a chain, later hosted `shell` responses sometimes started including `shell_call_output` items automatically in their returned `output`.\n\nSo there seem to be two inconsistent behaviors:\n\n 1. Some completed hosted-shell responses return only `shell_call` + `message`.\n 2. Other completed hosted-shell responses return both `shell_call` and `shell_call_output`.\n\n\n\nThat inconsistency makes it difficult to know whether the client is expected to replay tool output or whether the server should already be carrying it forward.\n\n## Includes Tested\n\nI also tested all documented `include` values that are compatible with reasoning models:\n\n\n [\n \"file_search_call.results\",\n \"web_search_call.results\",\n \"web_search_call.action.sources\",\n \"message.input_image.image_url\",\n \"computer_call_output.output.image_url\",\n \"code_interpreter_call.outputs\",\n \"reasoning.encrypted_content\",\n ]\n\n\nThis did not fix the issue.\n\n## Related But Separate Issue\n\nI am also investigating a separate 400 error in a larger workflow that mentions a missing reasoning item.\n\nAt the moment, I have **not** minimized that second issue to a standalone hosted-shell repro. In my local tests, once I manually replay `shell_call_output`, multi-turn hosted-shell chains can continue successfully and retain memory of earlier shell outputs.\n\nSo this report is specifically about the reproducible hosted `shell` continuation problem where:\n\n * the first response completes,\n * but continuation via `previous_response_id` fails unless the client manually reconstructs and submits `shell_call_output`.\n\n\n\n## Minimal Expected Contract\n\nFor hosted `shell` plus `previous_response_id`, one of these should be true consistently:\n\n 1. hosted shell execution state is fully preserved server-side, so direct continuation works, or\n 2. the API always returns the exact `shell_call_output` item needed for replay in the next request.\n\n\n\nRight now, neither appears reliable enough.\n\n## Local Artifacts Collected\n\nI collected raw response payloads during testing, including:\n\n * initial first-response payloads without `shell_call_output`\n * all-compatible-includes payloads\n * successful manual-`shell_call_output` workaround payloads\n * longer manual replay chains\n\n\n\nIf useful, I can also provide raw JSON examples.",
"title": "Hosted `shell` Continuations Require Missing `shell_call_output`"
}