External Publication
Visit Post

What the Back of a Wafer Tells Us About NVIDIA's Next Fifteen Years

Jason with his AI analysts February 25, 2026
Source

A16 and Feynman: Why NVIDIA Might Return to the Bleeding Edge After 15 Years

In the semiconductor industry, there's an unwritten rule: big chips stick to mature process nodes.

The logic is intuitive. A massive GPU die — north of 800 square millimeters — is exponentially more sensitive to yield than a compact mobile SoC. So for over a decade, from Fermi through Hopper, NVIDIA hewed closely to an "n-1" strategy: each generation shipped on a TSMC process that had already been broken in for a full node cycle. When Apple sprinted ahead to N3, Blackwell was still sitting comfortably on N5.

I held a similar view in a previous article — BNP Paribas analysts had pointed out that NVIDIA's published roadmap showed Rubin on N3 and Feynman on N2, which conflicted with rumors of a "first-to-A16" leap. It looked like supply chain noise.

But new information over the past few months has made me seriously reconsider.

Not Your Typical Node Shrink

To understand why A16 is special, you first need to understand what it actually is. It is not a routine iteration within TSMC's 2nm family.

TSMC A16 PPA Performance Gains (vs. N2P)

source: TSMC 2025 Technology Symposium

N2P and A16 both belong to the 2nm family, but they diverge sharply in technical approach. N2P continues with conventional frontside power delivery (FSPDN). A16, by contrast, is TSMC's first node to incorporate backside power delivery (Backside Power Delivery Network, BSPDN) — branded internally as SPR, or Super Power Rail.

Here's why SPR matters. In a traditional chip design, signal routing and power delivery lines are crammed together on the front side of the wafer. Think of it as a two-way single-lane road: signals carry data, power lines deliver electricity, and both compete for the same metal interconnect resources. The result? Routing congestion, signal interference, and — most critically for GPUs — IR-drop.

When high current travels long distances across the front side, resistive voltage drop constrains the chip's maximum stable operating frequency. The larger the die, the longer the path, the worse the problem gets. And NVIDIA's data center GPUs happen to be the largest, most power-hungry chips in the entire semiconductor industry. A single Blackwell compute die exceeds 800 mm² — that's reticle-limit territory.

SPR takes a fundamentally different approach: it moves the entire power delivery layer to the back of the wafer. As Bernstein's report details, once signals and power are physically separated, the front side's interconnect budget is freed up entirely for signal routing, while the backside power interconnects can be made wider (lower resistance), with shorter, more direct delivery paths.

TSMC A16 SPR Technology Overview

source: TSMC 2025 Technology Symposium

The results are unambiguous. According to TSMC's official figures: an 8–10% speed increase at the same voltage, 15–20% power reduction at the same speed, and 1.07–1.10x density improvement. BNP Paribas's analysis further estimates that compared to NVIDIA's current N5 GPUs, A16 could deliver a 1.5x frequency uplift or a 60% power reduction at equivalent voltage.

Frequency, Power, and Density Comparison Across TSMC Process Nodes

source: BNP Paribas

These numbers are compelling on their own. But the real strategic significance of A16 for NVIDIA isn't just incremental PPA improvement — it's the natural fit between SPR technology and large-die GPUs.

Why SPR Is "Born" for GPUs

This is the crux of the entire argument.

Traditional frontside power delivery works perfectly well on small mobile SoCs — the die area is modest, current paths are short, and IR-drop isn't a binding constraint. Scale up to data center GPUs, though, and the picture changes entirely.

A flagship NVIDIA GPU draws over 700W TDP (Rubin Ultra may approach 1,800W at the system level), with thousands of CUDA cores and Tensor cores running simultaneously at full tilt. Under a frontside power architecture, the IR-drop from high current traversing long on-die distances significantly depresses the maximum stable frequency ceiling.

Supply chain reports from December 2025 shed further light — NVIDIA's strong interest in A16 stems precisely from how SPR relocates high-current metal layers to the wafer's backside, making logic regions more independent and power delivery paths shorter. This directly boosts peak frequency and energy efficiency for AI accelerators under FP8 and FP4 high-density compute workloads.

Put simply: the problem SPR solves is exactly the problem that plagues large GPUs the most. This transforms A16 from a "general-purpose process upgrade" into a purpose-built node for HPC and AI — a positioning confirmed by TSMC's own product roadmap.

The Node Apple Doesn't Want, NVIDIA Will Take

One telling signal comes from Apple's decision.

Supply chain sources confirm that Apple will skip A16 entirely, transitioning from N2 straight to A14, slated for volume production in 2028. The initial A14 variant won't even support SPR backside power delivery; the SPR-enabled version (A14P) won't arrive until 2029.

Global Advanced Logic Process Roadmap (IMEC / Various Foundries)

source: BNP Paribas

Apple's reasoning isn't hard to follow. iPhone SoC die areas typically fall in the 100–150 mm² range — IR-drop isn't their primary bottleneck, and SPR's incremental benefits don't justify the added process cost and risk. What Apple cares about is A14's 1.2x logic density improvement and 25–30% power reduction — gains that translate more directly into battery life and user experience.

The implication? During A16's early production ramp, NVIDIA may be the sole customer.

That's unusual. TSMC generally avoids letting a single customer monopolize a node. But A16's technical profile — HPC-oriented, optimized for large, high-current designs — naturally narrows the customer base to a handful of players. And the company that can simultaneously meet all three of the following criteria appears to be NVIDIA alone:

  • Pre-booked EUV tool time
  • Willingness to absorb the yield risk of an early-stage node
  • Sufficient wafer volume to amortize costs

Reports from December 2025 went so far as to use the word "exclusive" — NVIDIA is no longer queuing for capacity. It is driving TSMC's process node evolution according to its own product roadmap.

The $6,000 "Minor" Issue

Cost, of course, remains an inescapable consideration.

A16 wafer prices are expected to exceed $30,000 per wafer — roughly 10–15% above N2's approximately $27,000. BNP Paribas estimates that switching from N3P to A16 would add roughly $300–400 per compute die. For a Rubin Ultra-class GPU containing four compute dies, that translates to approximately $1,200 in incremental silicon cost.

To maintain an 80% gross margin, this cost ultimately passes through to end products as roughly $6,000 in price increase.

Measured against Blackwell's current per-GPU ASP of roughly $30,000–35,000, that represents a ~17% price premium — a heavier markup than the process upgrade surcharges faced by smaller SoCs, to be sure. But the economics of AI accelerators operate on fundamentally different logic than consumer chips. For hyperscaler customers, the decisive metric isn't the absolute price of a GPU — it's total cost of ownership (TCO) per PFLOP, per token. If A16's 8–10% frequency uplift and 15–20% power reduction translate into measurable TCO improvements — in a market where a single 1 GW AI factory costs upward of $35 billion — paying a few thousand dollars more per GPU becomes eminently digestible. And a supply-constrained seller's market already grants NVIDIA the pricing power to pass costs downstream.

It's rare in semiconductors for the economic logic and technical logic of a process node to point so consistently in the same direction. For NVIDIA, A16 may be precisely that node.

Feynman: NVIDIA's Next Decade

GTC 2026 is just around the corner. From March 16 to 19, NVIDIA will host its annual developer conference in California, where Jensen Huang has teased the reveal of "a chip unlike anything the world has seen." The supply chain broadly expects this chip to be Feynman — NVIDIA's next-generation data center GPU, targeted for a 2028 launch.

NVIDIA Rubin Platform Architecture

source: NVIDIA

On NVIDIA's roadmap, Feynman sits after Rubin (H2 2026) and Rubin Ultra (H2 2027). Korean media reports from February 2026 indicate that Feynman will be the first to adopt TSMC's 1nm-class process, aligning closely with TSMC's A16 capacity expansion timeline — the Kaohsiung F22 fab (dedicated to N2 and A16 advanced processes) has its P1 and P2 phases completed, with P4 under construction.

And even with TSMC's entire 2nm family projected to exceed 200,000 wafer starts per month by 2028, demand is still expected to outstrip supply.

If Feynman does adopt A16, it would mark the first time since the 0.11-micron node that NVIDIA has manufactured its flagship GPU on the most advanced process available. This wouldn't merely be a technical decision — it would represent a structural rewriting of traditional chip manufacturing strategy, driven by the economic imperatives of the AI accelerator market.

Some Uncertainties Worth Exploring

Caution, naturally, is warranted. Several open questions deserve attention:

Yield risk remains real. A16 is TSMC's first backside power delivery node, and its manufacturing flow is more complex than traditional FSPDN — wafers must be flipped after front-side BEOL completion, bonded to a carrier wafer, thinned from the back, and then processed to build the backside power interconnect layers. Each step introduces potential new defect sources. For reticle-limit-scale dies, any yield degradation is amplified.

NVIDIA's public roadmap is still ambiguous. The 2024-vintage roadmap labeled Rubin as N3 and Feynman as N2 — but didn't distinguish between N2 and A16. Within TSMC's naming taxonomy, A16 belongs to the 2nm family but constitutes a distinct branch node. Whether NVIDIA ultimately selects A16, N2P, or even both (using heterogeneous integration across different dies) cannot be definitively confirmed at this point.

Intel is a wildcard. Reports suggest NVIDIA is considering allocating part of Feynman's GPU production to Intel, along with exploring packaging collaboration. If NVIDIA needs to diversify capacity risk across multiple foundries, process selection becomes more complex — Intel's 14A node also features backside power delivery (the PowerVia architecture) and is slated for volume production in 2028.

Before GTC

Return to the original question: will NVIDIA adopt A16 — the most advanced logic process of its era — for a massive data center GPU like Feynman?

Six months ago, I would have leaned toward unlikely. Now, my assessment demands more nuance.

A16 is not an ordinary process upgrade. Its SPR backside power delivery technology addresses the precise physical bottleneck that torments large GPUs — IR-drop and power delivery efficiency. TSMC has positioned it as an HPC/AI-dedicated node. Apple has stepped aside, freeing capacity to serve NVIDIA. And the high ASP of AI accelerators makes the wafer cost premium palatable. All of these factors converge into an unprecedented combination of conditions that may be sufficient to break NVIDIA's 15-year "n-1" tradition.

GTC 2026 is less than three weeks away. The chip Jensen Huang has promised — one "unlike anything the world has seen" — may well be the answer.

Discussion in the ATmosphere

Loading comments...