{
  "$type": "site.standard.document",
  "bskyPostRef": {
    "cid": "bafyreifgr3wsbvgii3an62xil2xg5qrhyzzguocioxcqtuxfqpr3rlqrga",
    "uri": "at://did:plc:j4nmy4ymoeorm3j6hzbijapg/app.bsky.feed.post/3m5gx72n3xg22"
  },
  "coverImage": {
    "$type": "blob",
    "ref": {
      "$link": "bafkreia7qtzlkdzbklkvj33pnpgvomh576phj6ekm7jvxjplmsh3elcjqu"
    },
    "mimeType": "image/jpeg",
    "size": 838107
  },
  "description": "A short video about NPUs and TPUs led to a deeper look at the physical side of AI. From the Neural Engine in your iPhone to the massive processors powering data-centre models.",
  "path": "/cpu-gpu-tpu-npu-explained/",
  "publishedAt": "2025-11-12T15:19:20.000Z",
  "site": "https://hoeijmakers.net",
  "tags": [
    "FinninTech",
    "Tiffany Janzen (@tiffintech) on ThreadsWhat is the difference between NPUs and TPUs?! Here is a simple explanation! You’re going to start hearing about NPUs everywhere so it is good to understand why tech companies have become so obsessed with them 💡 #tech #stem #techexplainedThreads",
    "https://t.co/InSyvjrrSi",
    "pic.twitter.com/dc5Y3tQSDh",
    "November 16, 2025",
    "Infinite scale: The architecture behind the Azure AI superfactory - The Official Microsoft Blog",
    "From Sand to Software: A Whistle-Stop Tour of the AI Value Chain",
    "Compute: A New Measure of the World",
    "The Neural Engine Does Not Run Your LLM",
    "Running Gemma 4 on Your iPhone",
    "Running a Local LLM on Your iPhone",
    "@rohanpaul_ai"
  ],
  "textContent": "It began with a short video. FinninTech explained the difference between TPUs and NPUs. A brief clip that suddenly made the invisible world of AI hardware tangible.\n\nTiffany Janzen (@tiffintech) on ThreadsWhat is the difference between NPUs and TPUs?! Here is a simple explanation! You’re going to start hearing about NPUs everywhere so it is good to understand why tech companies have become so obsessed with them 💡 #tech #stem #techexplainedThreads\n\nThat curiosity sent me down a path connecting the chip in my phone to the massive processors that train models like ChatGPT.\n\n### Quick takeaways\n\n  * CPUs, GPUs, TPUs and NPUs form a spectrum of _specialisation_ : from flexible generalists to highly efficient AI specialists.\n  * The iPhone’s **Neural Engine** is Apple’s name for its NPU, a miniature AI processor for local tasks.\n  * **FLOPS** and **TOPS** measure different kinds of computing power: precision versus speed.\n  * Export limits on chips such as NVIDIA’s H100 show how computing power has become a geopolitical factor.\n\n\n\n## The spectrum of specialisation\n\nArtificial intelligence may feel abstract, but it’s built on physical hardware — billions of transistors arranged for specific kinds of work.\n\nAt one end stands the **CPU** , a flexible all-rounder that handles logic and control. Then come **GPUs** , vast grids of simple cores designed for parallel maths. Beyond those lie **TPUs** and **NPUs** , processors made specifically for neural networks.\n\nYou can picture it as a line:\n\n> **CPU → GPU → TPU / NPU**\n>  As you move right, flexibility decreases, but efficiency for AI tasks rises sharply.\n\nWhere a CPU handles general tasks, a GPU multiplies matrices, a TPU accelerates training in Google’s data centres, and an NPU performs small-scale AI tasks efficiently on your device.\n\n## The chip in your pocket\n\nApple’s **A17 Pro** chip, used in the iPhone 16 Pro and newer models, combines three types of processors:\n\na CPU for everyday applications, a GPU for graphics, and a **Neural Engine** for machine learning.\n\nThis Neural Engine performs around **35 trillion operations per second** , powering on-device features such as transcription, photo recognition, and real-time translation. It consumes only a few watts, roughly a hundred thousand times less power than a data-centre GPU, yet fast enough for personal AI.\n\nA17 Pro has the Neural Engine on it, Apple's NPU.\n\n## FLOPS and TOPS: the language of compute\n\n**FLOPS** (_floating-point operations per second_) measure the ability to handle precise arithmetic -> needed for **_training_** large models.\n\n**TOPS** (_tera-operations per second_) describe simpler, lower-precision calculations -> ideal for **_running_** those models efficiently.\n\nTraining requires floating-point accuracy and immense power; inference, which happens on your phone, can use integer maths to save energy.\n\nIn short: GPUs and TPUs are measured in FLOPS, NPUs in TOPS.\n\n## TPU vs GPU: same idea, different philosophy\n\nA **GPU** is a programmable engine for parallel work and it is built for graphics, later adopted for AI.\n\nA **TPU** is Google’s own design: a _tensor processor_ built from the ground up for machine learning.\n\nIt’s not a GPU, but it draws on the same principle and that is performing many operations in parallel.\n\nWhile GPUs remain flexible, TPUs are hard-wired for the algebra behind neural networks, making them faster and more efficient for that single purpose.\n\n💡\n\nComing soon: Microsoft’s Maia. Maia is Microsoft’s own AI accelerator, optimised for transformer workloads. Functionally it resembles Google’s TPU family, but it has its own architecture, software stack, and integration into Azure.\n\n## The far end of the spectrum\n\nIn data centres, processors such as NVIDIA’s **H100** or **B100** dominate.\nEach consumes hundreds of watts and delivers several **petaflops** of performance. These chips now sit at the centre of export restrictions, because such computing capacity determines who can train the next generation of large models.\n\nTo comply with U.S. limits, NVIDIA built slower versions (A800, H800) for the Chinese market. It is the same hardware, with reduced interconnect speed.\nThe boundaries of computing power have become geopolitical borders.\n\n💡\n\nThe NVIDIA H100 and Google’s TPU both power today’s AI revolution, but they aren’t one-to-one rivals. The H100 is a flexible, general-purpose GPU evolved for deep learning; the TPU is a purpose-built tensor processor optimised for Google’s own ecosystem. They meet at the same goal, __accelerating neural computation__ , from opposite ends of the design spectrum.\n\n## From abstraction to atoms\n\nOnce you see AI through its hardware, it feels less ethereal.\nEvery neural network, from the model in your phone to the ones shaping global research, depends on physical constraints: heat, energy, and silicon.\n\nUnderstanding this spectrum, _from the Neural Engine in your pocket to the Tensor Processor in Google’s data halls_ , brings AI down to earth.\n\nIt reminds us that intelligence, however artificial, still runs on very real machinery.\n\n> Wow. Classic Jensen style, he ended the Nvidia vs. custom ASIC competition for good. 🫡\n>\n> The level of confidence with which he explains. 🎯\n>\n> He was answering to UBS research analyst question on how custom ASICs will affect NVIDIA or how they are going to compete with custom ASIC.… https://t.co/InSyvjrrSi pic.twitter.com/dc5Y3tQSDh\n>\n> — Rohan Paul (@rohanpaul_ai) November 16, 2025\n\n### Further reading\n\n  * Infinite scale: The architecture behind the Azure AI superfactory - The Official Microsoft Blog\n  * From Sand to Software: A Whistle-Stop Tour of the AI Value Chain\n  * Compute: A New Measure of the World\n  * The Neural Engine Does Not Run Your LLM\n  * Running Gemma 4 on Your iPhone\n  * Running a Local LLM on Your iPhone\n\n",
  "title": "From Silicon to Intelligence: Understanding the Hardware Behind AI",
  "updatedAt": "2026-05-10T08:53:47.548Z"
}