{
"$type": "site.standard.document",
"bskyPostRef": {
"cid": "bafyreih4vz5hkvyswoxjxd3jtxqcedqdzlo3dv67gqcbxqu4jvakwpooh4",
"uri": "at://did:plc:dz7fbvkxedbwlm4sroohfpee/app.bsky.feed.post/3mfoashjcex62"
},
"coverImage": {
"$type": "blob",
"ref": {
"$link": "bafkreiguropeix2inazdicg6s3hlcfmzaquyqufzsreaulou7gitljgp34"
},
"mimeType": "image/jpeg",
"size": 32971
},
"description": "Inception Labs introduces Mercury 2, a diffusion-based LLM designed for high-speed, multi-step reasoning tasks with 128K context window.",
"path": "/inception-labs-unveils-mercury-2-diffusion-llm-with-reasoning/",
"publishedAt": "2026-02-25T08:21:26.000Z",
"site": "https://www.testingcatalog.com",
"tags": [
"pic.twitter.com/McrQG4PFLZ",
"February 24, 2026",
"Inception Chat",
"Source",
"@StefanoErmon"
],
"textContent": "Inception Labs is positioning Mercury 2 as a reasoning-focused model aimed at production systems where latency accumulates across multi-step agent loops, retrieval pipelines, and large-scale extraction jobs. The company’s perspective is that modern AI work is no longer a single prompt and response, making left-to-right token generation the bottleneck that users notice.\n\n> Mercury 2 is live 🚀🚀\n>\n> The world’s first reasoning diffusion LLM, delivering 5x faster performance than leading speed-optimized LLMs.\n>\n> Watching the team turn years of research into a real product never gets old, and I’m incredibly proud of what we’ve built.\n>\n> We’re just getting… pic.twitter.com/McrQG4PFLZ\n>\n> — Stefano Ermon (@StefanoErmon) February 24, 2026\n\nInception states that Mercury 2 employs diffusion-style text generation instead of autoregressive decoding. According to their description, the model generates and refines many tokens in parallel over a small number of steps, then converges on the final output. The company argues that this approach shifts the usual tradeoff where stronger reasoning requires more test-time compute, which directly increases latency and cost.\n\n💡\n\nTest Mercury 2 on Inception Chat\n\nIn the announcement, Inception lists Mercury 2 at 1,009 tokens per second on NVIDIA Blackwell GPUs, featuring a 128K context window, tunable reasoning, native tool use, and schema-aligned JSON output. Pricing is presented as $0.25 per million input tokens and $0.75 per million output tokens. The company also claims OpenAI API compatibility to support drop-in adoption without major rewrites.\n\nThe post also includes throughput comparisons and benchmark-style figures, along with partner quotes focused on lower latency for transcript cleanup and faster automation-style workloads. Inception Labs is building its lineup around diffusion LLMs and presents its team as having contributed to widely used ML techniques and systems work.\n\nSource",
"title": "Inception Labs unveils Mercury 2 diffusion LLM with reasoning",
"updatedAt": "2026-02-25T08:21:26.000Z"
}