Raw Record Source

{
  "$type": "site.standard.document",
  "bskyPostRef": {
    "cid": "bafyreiald2bmtnt4f2o75szfffk2snwsnqgaccgnskhggj7ag66p5me2xm",
    "uri": "at://did:plc:ws6dhxzqnqxu5aqxt4kd27oc/app.bsky.feed.post/3mkno7d45tp52"
  },
  "coverImage": {
    "$type": "blob",
    "ref": {
      "$link": "bafkreido4xhbhcupiixplukbmrfjnfuwk6rhitvmh7pwedllicexsapcjm"
    },
    "mimeType": "image/png",
    "size": 2587209
  },
  "description": "How the open-weight 1.6T-parameter challenger stacks up against Anthropic's flagship on cost, context, and coding performance.",
  "path": "/deepseek-v4-vs-claude-opus-4-7-pricing-benchmarks-and-tradeoffs/",
  "publishedAt": "2026-04-29T17:54:33.000Z",
  "site": "https://allthings.how",
  "textContent": "Two very different bets sit at the top of the AI model market right now. Anthropic's Claude Opus 4.7, released April 16, 2026, is a closed, premium-priced flagship aimed at the hardest reasoning and coding work. DeepSeek V4, released April 24, 2026, is a 1.6-trillion-parameter Mixture-of-Experts model with open weights under the MIT License and pricing roughly one-sixth of Opus on a blended basis.\n\n**Quick answer:** Claude Opus 4.7 leads on most shared benchmarks (GPQA Diamond, SWE-Bench Pro, Humanity's Last Exam, MCP Atlas). DeepSeek V4-Pro trails by single-digit points on several of those tests, ties or wins on BrowseComp, and costs about $1.74 input / $3.48 output per million tokens versus Opus 4.7's $5.00 / $25.00. Pick Opus for top-tier coding and reasoning quality; pick V4 for near-frontier performance at a fraction of the cost or for self-hosting.\n\n* * *\n\n### Pricing and access\n\nThe economic gap is the headline. DeepSeek publishes V4 pricing in two tiers, Pro and Flash, both with a 1M-token context window and 384K maximum output. Opus 4.7 is API-only through Anthropic.\n\nModel| Input ($/M tokens)| Cached input| Output ($/M tokens)| Open weights\n---|---|---|---|---\nClaude Opus 4.7| $5.00| Tier-dependent| $25.00| No\nDeepSeek V4-Pro| $1.74| $0.145| $3.48| Yes (MIT)\nDeepSeek V4-Flash| $0.14| $0.028| $0.28| Yes (MIT)\n\nOn a simple one-million-input plus one-million-output blend, V4-Pro lands at $5.22 versus $30.00 for Opus 4.7. With cached input, V4-Pro drops to roughly $3.63, widening the gap to about one-eighth of Opus pricing. V4-Flash is the budget extreme at $0.42 blended, which sits below nearly every commercial model on the market.\n\nV4 weights are downloadable from Hugging Face, with V4-Pro at 865GB and V4-Flash at 160GB. Self-hosting V4-Pro at usable throughput typically requires 8×H100-class infrastructure or equivalent. Opus 4.7 has no self-host path.\n\n* * *\n\n### Benchmark head-to-head\n\nOn directly comparable evaluations published by both companies, Opus 4.7 holds the lead on academic reasoning and software engineering, while V4-Pro-Max gets close on agentic tasks and edges ahead on web-browsing benchmarks.\n\nBenchmark| DeepSeek V4-Pro-Max| Claude Opus 4.7| Lead\n---|---|---|---\nGPQA Diamond| 90.1%| 94.2%| Opus 4.7\nHumanity's Last Exam (no tools)| 37.7%| 46.9%| Opus 4.7\nHumanity's Last Exam (with tools)| 48.2%| 54.7%| Opus 4.7\nSWE-Bench Pro| 55.4%| 64.3%| Opus 4.7\nTerminal-Bench 2.0| 67.9%| 69.4%| Opus 4.7 (narrow)\nMCP Atlas| 73.6%| 79.1%| Opus 4.7\nBrowseComp| 83.4%| 79.3%| DeepSeek V4\n\nThe pattern is consistent: Opus wins academic and structured coding evaluations by 4 to 9 points, ties on terminal-style agent work, and loses on agentic web browsing. V4-Pro's published claims against older models such as Opus 4.6 and GPT-5.4 xHigh are stronger, but those don't reflect the current Anthropic flagship.\n\n* * *\n\n### Architecture and context\n\nBoth models offer a 1-million-token context window, but they get there differently. V4-Pro is a Mixture-of-Experts model with roughly 1.6 trillion total parameters and around 49 billion active per forward pass. It uses a hybrid attention design combining Compressed Sparse Attention and Heavily Compressed Attention, which DeepSeek says cuts single-token inference FLOPs to about 27% of V3.2 and KV cache to 10% at 1M-token scale.\n\nOpus 4.7's architecture is undisclosed. Anthropic exposes adaptive and extended thinking controls, image input, and a Claude tokenizer. V4-Pro is text-only on the base model; a separate vision variant exists but lags Opus on multimodal breadth.\n\nCapability| Claude Opus 4.7| DeepSeek V4-Pro\n---|---|---\nContext window| 1M tokens| 1M tokens\nMax output| 128K tokens| 384K tokens\nImage input| Yes| No (separate vision variant)\nReasoning modes| Adaptive / extended thinking| Non-think, Think High, Think Max\nFunction calling| Yes| Yes\nLicense| Proprietary| MIT\nAPI compatibility| Anthropic SDK| OpenAI ChatCompletions and Anthropic formats\n\nV4's 384K maximum output is a meaningful practical edge over Opus 4.7's 128K cap for tasks that need long single-response generations, such as full-document drafts, large refactors, or extended agent traces.\n\n* * *\n\n### Coding and agentic work\n\nFor pure coding quality on the hardest benchmarks, Opus 4.7 is the better tool. The 64.3% on SWE-Bench Pro is the leading verified result among the two, and Anthropic's models have a track record of stronger multi-file reasoning and tighter instruction adherence on complex constraint sets.\n\nV4-Pro is competitive in the same tier as older Opus releases and handles agentic coding pipelines well, particularly with its larger output budget and MIT-licensed weights. It integrates with common agent harnesses including Claude Code and OpenCode, and DeepSeek runs its own internal coding agents on it.\n\nFor long-horizon agent loops with 50+ tool calls, Opus 4.7 still shows less drift in practice. For high-volume code review, repository indexing, or CI-attached automation where cost dominates, V4 changes the math meaningfully.\n\n* * *\n\n### When to pick which\n\nWorkload| Better fit| Why\n---|---|---\nHardest reasoning, math, science Q&A| Claude Opus 4.7| Leads GPQA Diamond and HLE by 4–9 points\nMulti-file refactoring, complex SWE tasks| Claude Opus 4.7| SWE-Bench Pro lead, stronger constraint adherence\nImage-heavy or multimodal pipelines| Claude Opus 4.7| Native image input on the flagship model\nHigh-volume inference at scale| DeepSeek V4-Pro or Flash| 1/6 to 1/100 the cost per blended token\nSelf-hosting for privacy or compliance| DeepSeek V4-Pro| MIT-licensed open weights, on-prem deployment\nLong-output generation (200K+ tokens)| DeepSeek V4-Pro| 384K output cap vs. Opus 128K\nAgentic web browsing| DeepSeek V4-Pro| Beats Opus 4.7 on BrowseComp (83.4% vs. 79.3%)\nDomain fine-tuning| DeepSeek V4-Pro| Open weights allow custom training; Opus does not\n\n💡\n\nIf you don't know which to pick, route by task class: send hard reasoning, multi-file coding, and multimodal inputs to Opus 4.7, and route everything else (bulk processing, long context, agentic browsing, fine-tuned domain agents) to V4-Pro. The cost savings on the second bucket usually pay for the Opus calls in the first.\n\n* * *\n\nMost published V4 benchmark scores come from DeepSeek's own technical report. Independent third-party evaluations of V4 are still accumulating, and on tests less prone to gaming, such as ARC-AGI, V4 sits notably below the latest closed frontier models. Treat single-digit benchmark gaps as directional rather than decisive.\n\nPricing and availability change. Both vendors publish current rates on their official documentation, and V4 is currently in preview status with model IDs `deepseek-v4-pro` and `deepseek-v4-flash`. Legacy `deepseek-chat` and `deepseek-reasoner` endpoints are scheduled for retirement on July 24, 2026.\n\nFor most teams the practical answer isn't picking one. Opus 4.7 remains the model to reach for when correctness on a hard task is worth $25 per million output tokens. V4 makes a wider class of automation economically viable, and the open weights give it a deployment surface Anthropic can't match. The competitive pressure that creates is the real story of this release cycle.",
  "title": "DeepSeek V4 vs Claude Opus 4.7: Pricing, Benchmarks, and Tradeoffs",
  "updatedAt": "2026-04-29T17:54:34.887Z"
}