Raw Record Source

{
  "$type": "site.standard.document",
  "bskyPostRef": {
    "cid": "bafyreibqvjizl4x3z5zeirjmi42j4maoucpf36xrk4tqefyc63dpedi3eq",
    "uri": "at://did:plc:ws6dhxzqnqxu5aqxt4kd27oc/app.bsky.feed.post/3mjnvawpuihu2"
  },
  "coverImage": {
    "$type": "blob",
    "ref": {
      "$link": "bafkreigyfa574ryccgw2gb5v7lrfnm4kbrjwop57inygicsldf2okh6f4u"
    },
    "mimeType": "image/jpeg",
    "size": 72299
  },
  "description": "What adaptive thinking does in Opus 4.7, how to turn it on, and why requests now run without reasoning by default.",
  "path": "/claude-opus-4-7-adaptive-thinking-explained/",
  "publishedAt": "2026-04-17T02:35:30.000Z",
  "site": "https://allthings.how",
  "textContent": "Claude Opus 4.7 ships with a single thinking mode called adaptive thinking, and it changes how reasoning tokens get spent on every request. The model decides how much to think based on the difficulty of the prompt, instead of running against a fixed budget you set up front. It is also off by default on the Messages API, which trips up a lot of people migrating from Opus 4.6.\n\n💡\n\nQuick answer: Adaptive thinking is the only thinking-on mode in Opus 4.7. It is disabled by default. To enable it, set thinking: {\"type\": \"adaptive\"} on your request. The old budget_tokens parameter now returns a 400 error.\n\n* * *\n\n### What adaptive thinking does\n\nAdaptive thinking lets the model allocate reasoning tokens dynamically per request. Hard problems get more internal reasoning, simple prompts get less, and the decision happens at inference time rather than being capped by a number you pass in. Anthropic's internal evaluations showed this approach consistently outperforms the fixed-budget extended thinking used in Opus 4.6.\n\nThe tradeoff is control. You no longer tell the model to think for exactly 32,000 tokens. You tell it to think adaptively, and the model picks. If you need tighter control over spend across an entire agentic loop, that now lives in a separate feature called task budgets.\n\n* * *\n\n### Turning adaptive thinking on\n\nOn Opus 4.7, any request without a `thinking` field runs with thinking off. That is a behavior change from earlier Opus versions, and it catches teams who assumed thinking would carry over. You have to opt in explicitly.\n\n\n    response = client.messages.create(\n        model=\"claude-opus-4-7\",\n        max_tokens=32000,\n        thinking={\"type\": \"adaptive\"},\n        messages=[\n            {\"role\": \"user\", \"content\": \"Refactor this module for testability.\"}\n        ],\n    )\n\n\nEnabling adaptive thinking on Opus 4.7\n\nIf you also want the reasoning visible in responses, add a display flag. Thinking content is omitted from responses by default, which slightly reduces latency but hides the model's intermediate steps.\n\n\n    thinking = {\n        \"type\": \"adaptive\",\n        \"display\": \"summarized\",  # \"omitted\" is the default\n    }\n\n\nOpting back into visible reasoning\n\nIf your product streams reasoning to users, the omitted default will look like a long pause before the output starts. Setting `display` to `summarized` restores the streaming progress behavior users expect.\n\n* * *\n\n### What was removed from the Messages API\n\nOpus 4.7 drops a handful of parameters that worked on Opus 4.6. These are hard breaking changes on the Messages API, not soft deprecations. Claude Managed Agents users are not affected.\n\nParameter| Opus 4.6 behavior| Opus 4.7 behavior\n---|---|---\nExtended thinking budget| `thinking: {\"type\": \"enabled\", \"budget_tokens\": N}`| Returns 400 error; use adaptive instead\nSampling: temperature, top_p, top_k| Accepted at any value| Non-default values return 400 error\nThinking content in response| Included by default| Omitted by default; opt in with `display: \"summarized\"`\nTokenizer| Previous tokenizer| New tokenizer, up to 1.35x token count on the same text\n\nIf you were using `temperature = 0` for determinism, it never guaranteed identical outputs anyway. The cleanest migration path is to strip sampling parameters entirely and steer behavior through prompting.\n\n* * *\n\n### Effort levels and the new xhigh tier\n\nAdaptive thinking works alongside the effort parameter, which controls the intelligence-versus-cost tradeoff more coarsely. Opus 4.7 adds an `xhigh` level that sits between `high` and `max`, giving you a middle option for hard coding and agentic work where you want more reasoning than `high` but not the full latency of `max`.\n\nAnthropic recommends starting at `high` or `xhigh` for coding and agentic use cases, and keeping at least `high` for any intelligence-sensitive work. In Claude Code, the default effort level is now `xhigh` for all plans. Effort is a Messages API feature; Claude Managed Agents handles it automatically.\n\n* * *\n\n### Task budgets versus max_tokens\n\nBecause adaptive thinking removed the fine-grained thinking budget, Anthropic shipped task budgets as the new way to cap spend on long-running agents. Task budgets are in public beta and require the header `task-budgets-2026-03-13`.\n\nA task budget is advisory. It tells the model to target a token total across an entire agentic loop, including thinking, tool calls, tool results, and the final output. The model sees a running countdown and uses it to pace itself, cut low-value steps, and finish cleanly. Minimum is 20,000 tokens. It is not a hard cap — the model can overshoot.\n\n\n    response = client.beta.messages.create(\n        model=\"claude-opus-4-7\",\n        max_tokens=128000,\n        output_config={\n            \"effort\": \"high\",\n            \"task_budget\": {\"type\": \"tokens\", \"total\": 128000},\n        },\n        messages=[\n            {\"role\": \"user\", \"content\": \"Review the codebase and propose a refactor plan.\"}\n        ],\n        betas=[\"task-budgets-2026-03-13\"],\n    )\n\n\nTask budget with adaptive thinking\n\nThe difference from `max_tokens` matters. `max_tokens` is a hard ceiling per request, invisible to the model. `task_budget` is a suggestion across the whole loop that the model is aware of. Use `task_budget` for self-moderation, and `max_tokens` as the per-request fence. For open-ended agentic work where you want the best answer, skip the task budget.\n\n* * *\n\n### Behavior shifts that come with adaptive thinking\n\nAdaptive thinking is not just an on/off switch. It pairs with a set of behavior changes in Opus 4.7 that affect how prompts land. If you copy old prompts over without adjustments, you will see differences.\n\n  * **Literal instruction following.** The model no longer silently generalizes instructions from one item to another, especially at lower effort levels.\n  * **Response length scales with task complexity** rather than defaulting to a fixed verbosity.\n  * **Fewer tool calls by default.** The model prefers reasoning over action. Raising effort increases tool usage.\n  * **More direct tone** with less validation-forward phrasing and fewer emoji than Opus 4.6.\n  * **More regular progress updates** during long agentic traces. Remove scaffolding that forced status messages.\n  * **Fewer subagents spawned by default.** Steerable through prompting if you need more.\n\n\n\nPrompts that previously contained mitigations like \"double-check the slide layout before returning\" or \"think step by step before answering\" can often be simplified. Opus 4.7 handles those patterns natively, and leaving the scaffolding in can cause overcorrection.\n\n* * *\n\n### Tokenizer change and cost implications\n\nOpus 4.7 uses a new tokenizer. The same text may map to roughly 1.0 to 1.35 times as many tokens compared to Opus 4.6, depending on content type. Per-token pricing is unchanged at $5 per million input tokens and $25 per million output tokens, but effective cost per request can rise.\n\nTwo practical effects matter. First, `/v1/messages/count_tokens` returns different numbers than it did on Opus 4.6, so any compaction triggers tuned to specific thresholds need adjustment. Second, you should raise `max_tokens` to give the model headroom, especially for responses that were close to the limit before.\n\nThe 1M token context window remains available at standard pricing with no long-context premium.\n\n* * *\n\n### Verifying your migration worked\n\n**Step 1:** Switch your model ID from `claude-opus-4-6` to `claude-opus-4-7` and send a test request without a thinking field. Confirm the response succeeds and note that no reasoning is produced.\n\n**Step 2:** Add `thinking: {\"type\": \"adaptive\"}` to the request. The response should now include thinking blocks in the stream, even though their content is empty unless you opt into display.\n\n**Step 3:** Remove any `temperature`, `top_p`, or `top_k` parameters from your client code. Send a request with one set to a non-default value and confirm you get a 400 error, which verifies you've caught all code paths.\n\n**Step 4:** Run `/v1/messages/count_tokens` against representative prompts from your workload and compare against the Opus 4.6 counts. Adjust `max_tokens` and any compaction thresholds accordingly.\n\nYou'll know adaptive thinking is working correctly when simple prompts return quickly with minimal reasoning overhead, and complex prompts show visibly longer thinking phases — without you changing any parameters between the two requests.\n\n* * *\n\nAdaptive thinking is a meaningful shift in how you reason about cost and latency on Opus 4.7. The automatic allocation removes a tuning lever that some developers relied on, but replaces it with behavior that, in practice, tracks task difficulty more closely than fixed budgets ever did. If you need hard cost control, reach for task budgets and `max_tokens` together. If you need peak quality, leave the task budget off and let the model decide.",
  "title": "Claude Opus 4.7 adaptive thinking, explained",
  "updatedAt": "2026-04-17T02:35:31.839Z"
}