Raw Record Source

{
  "$type": "site.standard.document",
  "bskyPostRef": {
    "cid": "bafyreih4vz5hkvyswoxjxd3jtxqcedqdzlo3dv67gqcbxqu4jvakwpooh4",
    "uri": "at://did:plc:dz7fbvkxedbwlm4sroohfpee/app.bsky.feed.post/3mfoashjcex62"
  },
  "coverImage": {
    "$type": "blob",
    "ref": {
      "$link": "bafkreiguropeix2inazdicg6s3hlcfmzaquyqufzsreaulou7gitljgp34"
    },
    "mimeType": "image/jpeg",
    "size": 32971
  },
  "description": "Inception Labs introduces Mercury 2, a diffusion-based LLM designed for high-speed, multi-step reasoning tasks with 128K context window.",
  "path": "/inception-labs-unveils-mercury-2-diffusion-llm-with-reasoning/",
  "publishedAt": "2026-02-25T08:21:26.000Z",
  "site": "https://www.testingcatalog.com",
  "tags": [
    "pic.twitter.com/McrQG4PFLZ",
    "February 24, 2026",
    "Inception Chat",
    "Source",
    "@StefanoErmon"
  ],
  "textContent": "Inception Labs is positioning Mercury 2 as a reasoning-focused model aimed at production systems where latency accumulates across multi-step agent loops, retrieval pipelines, and large-scale extraction jobs. The company’s perspective is that modern AI work is no longer a single prompt and response, making left-to-right token generation the bottleneck that users notice.\n\n> Mercury 2 is live 🚀🚀\n>\n> The world’s first reasoning diffusion LLM, delivering 5x faster performance than leading speed-optimized LLMs.\n>\n> Watching the team turn years of research into a real product never gets old, and I’m incredibly proud of what we’ve built.\n>\n> We’re just getting… pic.twitter.com/McrQG4PFLZ\n>\n> — Stefano Ermon (@StefanoErmon) February 24, 2026\n\nInception states that Mercury 2 employs diffusion-style text generation instead of autoregressive decoding. According to their description, the model generates and refines many tokens in parallel over a small number of steps, then converges on the final output. The company argues that this approach shifts the usual tradeoff where stronger reasoning requires more test-time compute, which directly increases latency and cost.\n\n💡\n\nTest Mercury 2 on Inception Chat\n\nIn the announcement, Inception lists Mercury 2 at 1,009 tokens per second on NVIDIA Blackwell GPUs, featuring a 128K context window, tunable reasoning, native tool use, and schema-aligned JSON output. Pricing is presented as $0.25 per million input tokens and $0.75 per million output tokens. The company also claims OpenAI API compatibility to support drop-in adoption without major rewrites.\n\nThe post also includes throughput comparisons and benchmark-style figures, along with partner quotes focused on lower latency for transcript cleanup and faster automation-style workloads. Inception Labs is building its lineup around diffusion LLMs and presents its team as having contributed to widely used ML techniques and systems work.\n\nSource",
  "title": "Inception Labs unveils Mercury 2 diffusion LLM with reasoning",
  "updatedAt": "2026-02-25T08:21:26.000Z"
}