{
  "$type": "site.standard.document",
  "bskyPostRef": {
    "cid": "bafyreif5qc2x3b5dz3cy4pw3vg3cquqdchpprjc6teyv43np4nddcyhsgu",
    "uri": "at://did:plc:pgryn3ephfd2xgft23qokfzt/app.bsky.feed.post/3mld4ydpejkt2"
  },
  "path": "/t/they-said-unquantized-local-ai-was-impossible-on-budget-phones-we-got-a-2-3gb-fp32-model-running-locally-on-a-120-galaxy-a25-cpu-no-gpu-no-npu-uses-less-ram-than-chrome/175739#post_3",
  "publishedAt": "2026-05-08T05:47:16.000Z",
  "site": "https://discuss.huggingface.co",
  "tags": [
    "(click for more details)"
  ],
  "textContent": "Interesting experiment with FP32, but 0.17 t/s is a computational dead end.\n\nWe solved the “edge logic” problem differently. Why force a mobile CPU to do 32-bit tensor math when you can use **Local-First Orchestration**?\n\nMy mobile node (**ChatVTX**) is only **6MB**. It doesn’t heat the battery, it doesn’t crash, and it gives me full access to a 8B parameter model with sub-second latency via a direct Nitro-link.\n\n**Full project details, screenshots, and community discussion here (4PDA):** [Link to your 4PDA thread]\n\n_(Note: The forum is in Russian, but the architecture and results speak for themselves. You can use a translator to check the technical logs)._\n\nArchitecture beats brute force every time. While you’re celebrating 0.17 t/s, we are running full-scale OSINT swarms on the go.\n\n▶ [OPEN] NovBase OSINT Report (Swarm Sync: 6522 chars) (click for more details)",
  "title": "They said unquantized local AI was impossible on budget phones. We got a 2.3GB FP32 model running locally on a €120 Galaxy A25 CPU. No GPU, no NPU, uses less RAM than Chrome"
}