{
"$type": "site.standard.document",
"bskyPostRef": {
"cid": "bafyreif5qc2x3b5dz3cy4pw3vg3cquqdchpprjc6teyv43np4nddcyhsgu",
"uri": "at://did:plc:pgryn3ephfd2xgft23qokfzt/app.bsky.feed.post/3mld4ydpejkt2"
},
"path": "/t/they-said-unquantized-local-ai-was-impossible-on-budget-phones-we-got-a-2-3gb-fp32-model-running-locally-on-a-120-galaxy-a25-cpu-no-gpu-no-npu-uses-less-ram-than-chrome/175739#post_3",
"publishedAt": "2026-05-08T05:47:16.000Z",
"site": "https://discuss.huggingface.co",
"tags": [
"(click for more details)"
],
"textContent": "Interesting experiment with FP32, but 0.17 t/s is a computational dead end.\n\nWe solved the “edge logic” problem differently. Why force a mobile CPU to do 32-bit tensor math when you can use **Local-First Orchestration**?\n\nMy mobile node (**ChatVTX**) is only **6MB**. It doesn’t heat the battery, it doesn’t crash, and it gives me full access to a 8B parameter model with sub-second latency via a direct Nitro-link.\n\n**Full project details, screenshots, and community discussion here (4PDA):** [Link to your 4PDA thread]\n\n_(Note: The forum is in Russian, but the architecture and results speak for themselves. You can use a translator to check the technical logs)._\n\nArchitecture beats brute force every time. While you’re celebrating 0.17 t/s, we are running full-scale OSINT swarms on the go.\n\n▶ [OPEN] NovBase OSINT Report (Swarm Sync: 6522 chars) (click for more details)",
"title": "They said unquantized local AI was impossible on budget phones. We got a 2.3GB FP32 model running locally on a €120 Galaxy A25 CPU. No GPU, no NPU, uses less RAM than Chrome"
}