{
"$type": "site.standard.document",
"bskyPostRef": {
"cid": "bafyreihrwtum53xmtq7m3ww52vhijxi4rgm7ks24g77oa7zoubft72leve",
"uri": "at://did:plc:ws6dhxzqnqxu5aqxt4kd27oc/app.bsky.feed.post/3mjorccfcbnj2"
},
"coverImage": {
"$type": "blob",
"ref": {
"$link": "bafkreigyfa574ryccgw2gb5v7lrfnm4kbrjwop57inygicsldf2okh6f4u"
},
"mimeType": "image/jpeg",
"size": 72299
},
"description": "A practical breakdown of Anthropic's adaptive reasoning mode, its effort controls, supported models, and the tradeoffs developers should know.",
"path": "/claude-adaptive-thinking-explained-how-it-works-and-when-to-use-it/",
"publishedAt": "2026-04-17T10:57:21.000Z",
"site": "https://allthings.how",
"tags": [
"Claude API documentation"
],
"textContent": "Adaptive thinking is Anthropic's newer approach to extended reasoning, letting Claude decide on its own whether a prompt needs deep thought and how much of it to apply. Instead of handing the model a fixed token budget for internal reasoning, you flip a single switch and let Claude scale effort to the complexity of the request. On the latest Opus-class model, it is the only thinking mode the API will accept.\n\n⚡\n\nQuick answer: Set thinking: {type: \"adaptive\"} in your Messages API request. Claude will decide when and how deeply to think. Use the effort parameter (low, medium, high, max, or xhigh on Opus 4.7) to nudge that behavior.\n\n* * *\n\n### What adaptive thinking actually does\n\nIn adaptive mode, reasoning becomes optional from the model's point of view. Claude evaluates each incoming request, judges its complexity, and chooses whether to produce a thinking block at all, plus how long to spend inside it. At the default `high` effort setting, the model almost always thinks. At lower effort levels, it will skip thinking on trivial queries like a quick factual lookup.\n\nThis replaces the older pattern of setting a fixed `budget_tokens` value and hoping it fit the task. Anthropic reports that adaptive thinking drives better performance than a fixed budget on many workloads, particularly bimodal tasks where some requests need minimal reasoning and others need a lot, and on long-horizon agentic workflows.\n\nAdaptive mode also automatically turns on interleaved thinking, which means Claude can think between tool calls rather than only at the start of a turn. For agent loops that chain search, code execution, and follow-up reasoning, this matters. No beta header is required.\n\n* * *\n\n### Supported models and availability\n\nAdaptive thinking is not available everywhere. It is specifically tied to the newer Claude 4-series reasoning models, and behavior differs meaningfully across them.\n\nModel| Adaptive thinking behavior\n---|---\nClaude Mythos Preview (`claude-mythos-preview`)| Default mode; applies automatically when `thinking` is unset. `type: \"disabled\"` is not supported.\nClaude Opus 4.7 (`claude-opus-4-7`)| Only supported thinking mode. Manual `type: \"enabled\"` with `budget_tokens` returns a 400 error. Thinking is off unless you explicitly set `type: \"adaptive\"`.\nClaude Opus 4.6 (`claude-opus-4-6`)| Recommended mode. Manual `budget_tokens` still works but is deprecated.\nClaude Sonnet 4.6 (`claude-sonnet-4-6`)| Recommended mode. Manual `budget_tokens` still works but is deprecated.\nOlder models (Sonnet 4.5, Opus 4.5, etc.)| Not supported. Require manual `type: \"enabled\"` with `budget_tokens`.\n\nIf you are still shipping on Opus 4.6 or Sonnet 4.6 with a fixed budget, your requests will keep working, but plan a migration. Anthropic has marked that configuration for removal in a future model release.\n\n* * *\n\n### The effort parameter\n\nEffort is how you steer adaptive thinking without micromanaging token counts. It acts as soft guidance to the model about how much reasoning to allocate, and it pairs directly with `max_tokens`, which remains a hard cap on total output (thinking plus visible response).\n\nEffort| Behavior| Availability\n---|---|---\n`max`| Always thinks, no constraints on depth.| Mythos Preview, Opus 4.7, Opus 4.6, Sonnet 4.6\n`xhigh`| Always thinks deeply with extended exploration.| Opus 4.7 only\n`high` (default)| Always thinks. Deep reasoning on complex tasks.| All adaptive-capable models\n`medium`| Moderate thinking. May skip for very simple queries.| All adaptive-capable models\n`low`| Minimal thinking. Skips for simple, speed-sensitive tasks.| All adaptive-capable models\n\nIf you see responses hitting `stop_reason: \"max_tokens\"` at high or max effort, either raise `max_tokens` or dial the effort down. The two controls work together: `max_tokens` for hard cost ceilings, `effort` for shaping the model's internal allocation.\n\n* * *\n\n### A minimal request\n\nThe API call itself is small. Here is the basic shape using the Python SDK:\n\n\n import anthropic\n\n client = anthropic.Anthropic()\n\n response = client.messages.create(\n model=\"claude-opus-4-7\",\n max_tokens=16000,\n thinking={\"type\": \"adaptive\"},\n output_config={\"effort\": \"medium\"},\n messages=[\n {\"role\": \"user\", \"content\": \"What is the capital of France?\"}\n ],\n )\n\n print(response.content[0].text)\n\n\nAdaptive thinking with effort\n\nStreaming works the same way. Thinking content arrives via `thinking_delta` events inside `content_block_delta`, exactly as with manual mode. You iterate the stream and handle thinking and text deltas separately.\n\n* * *\n\n### Summarized versus omitted thinking\n\nThinking output is controlled by a `display` field on the thinking configuration. It has two accepted values, and the defaults differ by model.\n\ndisplay value| What you get back| Default on\n---|---|---\n`\"summarized\"`| Thinking block contains a summary of Claude's internal reasoning.| Opus 4.6, Sonnet 4.6, earlier Claude 4 models\n`\"omitted\"`| Empty `thinking` field. The encrypted `signature` still carries the full reasoning for multi-turn continuity.| Opus 4.7, Mythos Preview\n\nThe switch to `\"omitted\"` on Opus 4.7 is a silent change from Opus 4.6. If you were relying on visible summarized thinking and suddenly see empty blocks, set it back explicitly:\n\n\n thinking = {\n \"type\": \"adaptive\",\n \"display\": \"summarized\",\n }\n\n\nRestore summarized thinking on Opus 4.7\n\nOmitting the summary is a latency optimization, not a cost one. You are still billed for the full internal thinking tokens either way. What changes is time-to-first-text-token when streaming, because the server skips streaming thinking tokens and delivers only the signature before the final text response.\n\n* * *\n\n### Billing realities\n\nExtended thinking is charged as output tokens for the full internal reasoning, not for what you see in the response. That creates a consistent gotcha: the billed output token count will not match the token count visible in the response body.\n\n * Input tokens: your original request, not thinking tokens from previous turns.\n * Billed output tokens: full internal thinking tokens generated by the model.\n * Visible output tokens: either the summary (with `\"summarized\"`) or zero (with `\"omitted\"`).\n * No charge for generating the summary itself.\n\n\n\nWhen you pass thinking blocks back into a multi-turn conversation, the thinking tokens from the last assistant turn count as input tokens. A specialized system prompt is also automatically included when extended thinking is enabled.\n\n* * *\n\n### Prompt caching behavior\n\nConsecutive requests using the same adaptive thinking configuration preserve prompt cache breakpoints as expected. The wrinkle is mode switching: flipping between `adaptive`, `enabled`, and `disabled` breaks cache breakpoints for messages. System prompts and tool definitions remain cached regardless of mode changes.\n\nIf cache hit rates suddenly collapse, check whether something in your pipeline is toggling thinking modes between turns of the same conversation.\n\n* * *\n\n### Validation and multi-turn rules\n\nAdaptive mode is more forgiving than manual mode on prior turns. Previous assistant turns do not need to start with thinking blocks, which relaxes the stricter requirement manual-thinking requests enforce. This simplifies round-tripping conversation history from mixed sources.\n\nWhen you do pass thinking blocks back, send them unchanged. The server decrypts the `signature` field to reconstruct the original reasoning for prompt construction. If you pass an omitted block back with custom text stuffed into the `thinking` field, that text is ignored. The `signature` is identical whether display is summarized or omitted, and switching display values between turns is supported.\n\nThinking blocks only strictly need to be echoed back when you are combining extended thinking with tool use. Otherwise you can drop them from prior turns, or let the API strip them for you.\n\n* * *\n\n### Interleaved thinking across modes\n\nInterleaved thinking, where Claude reasons between tool calls, is tied to both mode and model.\n\nMode / Model| Interleaved thinking\n---|---\nAdaptive mode on Mythos Preview, Opus 4.7, Opus 4.6, Sonnet 4.6| Automatically enabled. On Mythos Preview and Opus 4.7, inter-tool reasoning always lives inside thinking blocks.\nManual mode on Sonnet 4.6| Requires the `interleaved-thinking-2025-05-14` beta header.\nManual mode on Opus 4.6| Not available. Use adaptive mode if you need thinking between tool calls.\n\nFor any agent that chains multiple tool calls, adaptive mode is the simpler path on current Opus and Sonnet models.\n\n* * *\n\n### Steering when Claude thinks too much or too little\n\nAdaptive thinking's triggering behavior is promptable. If the model is over-thinking simple prompts or under-thinking nuanced ones, add explicit guidance to your system prompt. A common pattern looks like this:\n\n\n Extended thinking adds latency and should only be used when it\n will meaningfully improve answer quality — typically for problems\n that require multi-step reasoning. When in doubt, respond directly.\n\n\n\nSystem prompt nudge\n\nPushing Claude to think less can reduce quality on tasks that genuinely benefit from reasoning. Before rolling a prompt-level nudge into production, measure against your own evals, and try a lower effort level first.\n\n* * *\n\n### When to choose adaptive, manual, or disabled\n\nMode| Config| Best for\n---|---|---\nAdaptive| `thinking: {type: \"adaptive\"}`| Default choice on Opus 4.7, Opus 4.6, Sonnet 4.6, and Mythos Preview. Let Claude decide; steer with `effort`.\nManual| `thinking: {type: \"enabled\", budget_tokens: N}`| When you need predictable latency or precise control over thinking token spend. Not accepted on Opus 4.7. Deprecated on Opus 4.6 and Sonnet 4.6.\nDisabled| Omit `thinking` or pass `{type: \"disabled\"}`| Lowest latency when you do not need extended reasoning. Not supported on Mythos Preview.\n\nFor new projects on Opus 4.6, Sonnet 4.6, or anything newer, start with adaptive at default effort and tune from there. Reach for manual only if your workload has strict latency SLAs or needs exact per-request cost bounds that adaptive's soft guidance cannot guarantee.\n\n* * *\n\n### Practical notes on production use\n\nA few things tend to trip up teams moving from fixed budgets to adaptive:\n\n * At `high` and `max` effort, the model can run long and exhaust `max_tokens` before finishing a visible response. Leave headroom.\n * Billing dashboards will show higher output token counts than response bodies suggest. This is expected under summarized or omitted display.\n * If you simply want faster first-token streaming and do not surface reasoning to users, pass `display: \"omitted\"`. Cost does not change.\n * The `signature` field is opaque and cross-platform. Values generated on the Claude API work with Amazon Bedrock and Vertex AI.\n * On Bedrock, `max` effort is restricted to Opus 4.6. Requests using `max` on other Bedrock-hosted models return an error.\n\n\n\nIf you need a full specification or want to compare against the older manual-budget path, the Claude API documentation covers every field, including streaming event shapes and tool-use interactions.\n\nAdaptive thinking is the direction Anthropic is clearly pushing for its frontier models, and on Opus 4.7 the choice has already been made for you. For most workloads, handing that decision to the model, then shaping it with effort and display, produces cleaner code and better results than hand-tuning a token budget per request.",
"title": "Claude Adaptive Thinking Explained: How It Works and When to Use It",
"updatedAt": "2026-04-17T10:57:23.684Z"
}