Raw Record Source

{
  "$type": "site.standard.document",
  "bskyPostRef": {
    "cid": "bafyreificqspfssi5wv2c3cmvoz4s4flgrkcvnxvcbonp7i5aovhjtrbhy",
    "uri": "at://did:plc:lk3jfj3zq4k4wxnk474axylu/app.bsky.feed.post/3mggah3jkpfn2"
  },
  "path": "/t/hosted-shell-continuations-require-missing-shell-call-output/1375917#post_1",
  "publishedAt": "2026-03-06T20:30:56.000Z",
  "site": "https://community.openai.com",
  "textContent": "I found a reproducible issue with the Responses API when using the hosted `shell` tool together with `previous_response_id`.\n\nA first response can complete successfully with a hosted `shell` call, but a direct continuation using only `previous_response_id` fails with:\n\n\n    Error code: 400 - {'error': {'message': 'No tool output found for shell call call_...', 'type': 'invalid_request_error', 'param': 'input', 'code': None}}\n\n\nThe surprising part is that the first response itself is already marked `completed`, and the assistant message includes the shell result, but the response payload often does not include a `shell_call_output` item.\n\n## Why This Looks Like a Bug\n\nThe API appears to require a `shell_call_output` item for continuation, while also sometimes not returning that item in the first response payload.\n\nThis creates an inconsistent contract:\n\n  1. The hosted `shell` tool executes server-side.\n  2. The first response is `completed`.\n  3. The assistant can describe the shell output in natural language.\n  4. But continuing from that response via `previous_response_id` can fail because the server says the shell output is missing.\n\n\n\n## Environment\n\n  * Model: `gpt-5.2`\n  * Responses API\n  * Hosted tool: `shell`\n  * Tested with Python SDK versions:\n    * `openai 2.24.0`\n    * `openai 2.26.0`\n  * Result was the same on both versions for the main repro.\n\n\n\n## Main Reproduction\n\n### Request 1\n\nCreate a response with hosted `shell`:\n\n\n    from openai import AsyncOpenAI\n\n    client = AsyncOpenAI()\n\n    resp1 = await client.responses.create(\n        model=\"gpt-5.2\",\n        input=\"Use the shell tool once to run: printf first_turn. Then briefly report the output.\",\n        tools=[{\"type\": \"shell\", \"environment\": {\"type\": \"container_auto\"}}],\n        reasoning={\"effort\": \"medium\", \"summary\": \"detailed\"},\n        include=[\"reasoning.encrypted_content\"],\n        background=True,\n    )\n\n\nPoll until terminal with `client.responses.retrieve(resp1.id)`.\n\n### Observed Response 1 Shape\n\nIn multiple runs, the first completed response looked like this structurally:\n\n\n    {\n      \"status\": \"completed\",\n      \"output\": [\n        {\"type\": \"reasoning\"},\n        {\"type\": \"shell_call\", \"status\": \"completed\", \"call_id\": \"call_...\"},\n        {\"type\": \"reasoning\"},\n        {\"type\": \"message\"}\n      ]\n    }\n\n\nNotably absent:\n\n  * no `shell_call_output`\n\n\n\nEven though the assistant message already described the shell result.\n\n### Request 2\n\nNow continue directly from the first response:\n\n\n    resp2 = await client.responses.create(\n        model=\"gpt-5.2\",\n        previous_response_id=resp1.id,\n        input=\"Now answer with exactly: second turn worked\",\n        tools=[{\"type\": \"shell\", \"environment\": {\"type\": \"container_auto\"}}],\n        reasoning={\"effort\": \"medium\", \"summary\": \"detailed\"},\n        include=[\"reasoning.encrypted_content\"],\n        background=True,\n    )\n\n\n### Actual Result\n\nThis fails immediately with:\n\n\n    Error code: 400 - {'error': {'message': 'No tool output found for shell call call_...', 'type': 'invalid_request_error', 'param': 'input', 'code': None}}\n\n\n### Expected Result\n\nIf the hosted shell call completed server-side and the first response is already terminal, then either:\n\n  1. `previous_response_id` continuation should work without any extra client-side tool-output replay, or\n  2. the first response should always include the required `shell_call_output` item so the client can replay it deterministically.\n\n\n\n## Verified Workaround\n\nThe continuation works if I manually inject a `shell_call_output` input item in the next request:\n\n\n    resp2 = await client.responses.create(\n        model=\"gpt-5.2\",\n        previous_response_id=resp1.id,\n        input=[\n            {\n                \"type\": \"shell_call_output\",\n                \"call_id\": \"call_from_resp1\",\n                \"status\": \"completed\",\n                \"output\": [\n                    {\n                        \"stdout\": \"first_turn\",\n                        \"stderr\": \"\",\n                        \"outcome\": {\"type\": \"exit\", \"exit_code\": 0},\n                    }\n                ],\n            },\n            {\n                \"role\": \"user\",\n                \"content\": \"Now answer with exactly: second turn worked.\",\n            },\n        ],\n        reasoning={\"effort\": \"medium\", \"summary\": \"detailed\"},\n        background=True,\n    )\n\n\nThis succeeds.\n\n## Additional Observation: Inconsistent `shell_call_output` Presence\n\nAfter introducing manual `shell_call_output` replay in a chain, later hosted `shell` responses sometimes started including `shell_call_output` items automatically in their returned `output`.\n\nSo there seem to be two inconsistent behaviors:\n\n  1. Some completed hosted-shell responses return only `shell_call` + `message`.\n  2. Other completed hosted-shell responses return both `shell_call` and `shell_call_output`.\n\n\n\nThat inconsistency makes it difficult to know whether the client is expected to replay tool output or whether the server should already be carrying it forward.\n\n## Includes Tested\n\nI also tested all documented `include` values that are compatible with reasoning models:\n\n\n    [\n        \"file_search_call.results\",\n        \"web_search_call.results\",\n        \"web_search_call.action.sources\",\n        \"message.input_image.image_url\",\n        \"computer_call_output.output.image_url\",\n        \"code_interpreter_call.outputs\",\n        \"reasoning.encrypted_content\",\n    ]\n\n\nThis did not fix the issue.\n\n## Related But Separate Issue\n\nI am also investigating a separate 400 error in a larger workflow that mentions a missing reasoning item.\n\nAt the moment, I have **not** minimized that second issue to a standalone hosted-shell repro. In my local tests, once I manually replay `shell_call_output`, multi-turn hosted-shell chains can continue successfully and retain memory of earlier shell outputs.\n\nSo this report is specifically about the reproducible hosted `shell` continuation problem where:\n\n  * the first response completes,\n  * but continuation via `previous_response_id` fails unless the client manually reconstructs and submits `shell_call_output`.\n\n\n\n## Minimal Expected Contract\n\nFor hosted `shell` plus `previous_response_id`, one of these should be true consistently:\n\n  1. hosted shell execution state is fully preserved server-side, so direct continuation works, or\n  2. the API always returns the exact `shell_call_output` item needed for replay in the next request.\n\n\n\nRight now, neither appears reliable enough.\n\n## Local Artifacts Collected\n\nI collected raw response payloads during testing, including:\n\n  * initial first-response payloads without `shell_call_output`\n  * all-compatible-includes payloads\n  * successful manual-`shell_call_output` workaround payloads\n  * longer manual replay chains\n\n\n\nIf useful, I can also provide raw JSON examples.",
  "title": "Hosted `shell` Continuations Require Missing `shell_call_output`"
}