External Publication

Hosted `shell` Continuations Require Missing `shell_call_output`

OpenAI Developer Community March 6, 2026

I found a reproducible issue with the Responses API when using the hosted shell tool together with previous_response_id.

A first response can complete successfully with a hosted shell call, but a direct continuation using only previous_response_id fails with:

Error code: 400 - {'error': {'message': 'No tool output found for shell call call_...', 'type': 'invalid_request_error', 'param': 'input', 'code': None}}

The surprising part is that the first response itself is already marked completed, and the assistant message includes the shell result, but the response payload often does not include a shell_call_output item.

Why This Looks Like a Bug

The API appears to require a shell_call_output item for continuation, while also sometimes not returning that item in the first response payload.

This creates an inconsistent contract:

The hosted shell tool executes server-side.
The first response is completed.
The assistant can describe the shell output in natural language.
But continuing from that response via previous_response_id can fail because the server says the shell output is missing.

Environment

Model: gpt-5.2
Responses API
Hosted tool: shell
Tested with Python SDK versions:
- openai 2.24.0
- openai 2.26.0
Result was the same on both versions for the main repro.

Main Reproduction

Request 1

Create a response with hosted shell:

from openai import AsyncOpenAI

client = AsyncOpenAI()

resp1 = await client.responses.create(
    model="gpt-5.2",
    input="Use the shell tool once to run: printf first_turn. Then briefly report the output.",
    tools=[{"type": "shell", "environment": {"type": "container_auto"}}],
    reasoning={"effort": "medium", "summary": "detailed"},
    include=["reasoning.encrypted_content"],
    background=True,
)

Poll until terminal with client.responses.retrieve(resp1.id).

Observed Response 1 Shape

In multiple runs, the first completed response looked like this structurally:

{
  "status": "completed",
  "output": [
    {"type": "reasoning"},
    {"type": "shell_call", "status": "completed", "call_id": "call_..."},
    {"type": "reasoning"},
    {"type": "message"}
  ]
}

Notably absent:

no shell_call_output

Even though the assistant message already described the shell result.

Request 2

Now continue directly from the first response:

resp2 = await client.responses.create(
    model="gpt-5.2",
    previous_response_id=resp1.id,
    input="Now answer with exactly: second turn worked",
    tools=[{"type": "shell", "environment": {"type": "container_auto"}}],
    reasoning={"effort": "medium", "summary": "detailed"},
    include=["reasoning.encrypted_content"],
    background=True,
)

Actual Result

This fails immediately with:

Error code: 400 - {'error': {'message': 'No tool output found for shell call call_...', 'type': 'invalid_request_error', 'param': 'input', 'code': None}}

Expected Result

If the hosted shell call completed server-side and the first response is already terminal, then either:

previous_response_id continuation should work without any extra client-side tool-output replay, or
the first response should always include the required shell_call_output item so the client can replay it deterministically.

Verified Workaround

The continuation works if I manually inject a shell_call_output input item in the next request:

resp2 = await client.responses.create(
    model="gpt-5.2",
    previous_response_id=resp1.id,
    input=[
        {
            "type": "shell_call_output",
            "call_id": "call_from_resp1",
            "status": "completed",
            "output": [
                {
                    "stdout": "first_turn",
                    "stderr": "",
                    "outcome": {"type": "exit", "exit_code": 0},
                }
            ],
        },
        {
            "role": "user",
            "content": "Now answer with exactly: second turn worked.",
        },
    ],
    reasoning={"effort": "medium", "summary": "detailed"},
    background=True,
)

This succeeds.

Additional Observation: Inconsistent `shell_call_output` Presence

After introducing manual shell_call_output replay in a chain, later hosted shell responses sometimes started including shell_call_output items automatically in their returned output.

So there seem to be two inconsistent behaviors:

Some completed hosted-shell responses return only shell_call + message.
Other completed hosted-shell responses return both shell_call and shell_call_output.

That inconsistency makes it difficult to know whether the client is expected to replay tool output or whether the server should already be carrying it forward.

Includes Tested

I also tested all documented include values that are compatible with reasoning models:

[
    "file_search_call.results",
    "web_search_call.results",
    "web_search_call.action.sources",
    "message.input_image.image_url",
    "computer_call_output.output.image_url",
    "code_interpreter_call.outputs",
    "reasoning.encrypted_content",
]

This did not fix the issue.

Related But Separate Issue

I am also investigating a separate 400 error in a larger workflow that mentions a missing reasoning item.

At the moment, I have not minimized that second issue to a standalone hosted-shell repro. In my local tests, once I manually replay shell_call_output, multi-turn hosted-shell chains can continue successfully and retain memory of earlier shell outputs.

So this report is specifically about the reproducible hosted shell continuation problem where:

the first response completes,
but continuation via previous_response_id fails unless the client manually reconstructs and submits shell_call_output.

Minimal Expected Contract

For hosted shell plus previous_response_id, one of these should be true consistently:

hosted shell execution state is fully preserved server-side, so direct continuation works, or
the API always returns the exact shell_call_output item needed for replay in the next request.

Right now, neither appears reliable enough.

Local Artifacts Collected

I collected raw response payloads during testing, including:

initial first-response payloads without shell_call_output
all-compatible-includes payloads
successful manual-shell_call_output workaround payloads
longer manual replay chains

If useful, I can also provide raw JSON examples.