{
"$type": "site.standard.document",
"bskyPostRef": {
"cid": "bafyreigkpvvx52zkgr4nx2jkmeml74lovqxqeuevmyuzck64dc24kr7pke",
"uri": "at://did:plc:pgryn3ephfd2xgft23qokfzt/app.bsky.feed.post/3mldr5rirpwe2"
},
"path": "/t/they-said-unquantized-local-ai-was-impossible-on-budget-phones-we-got-a-2-3gb-fp32-model-running-locally-on-a-120-galaxy-a25-cpu-no-gpu-no-npu-uses-less-ram-than-chrome/175739#post_7",
"publishedAt": "2026-05-08T12:12:01.000Z",
"site": "https://discuss.huggingface.co",
"textContent": "### Engineering Insight: The Reality of On-Device Intelligence\n\n\"It is inspiring to see so many developers chasing the dream of a truly ‘living’ AI on a handheld device. However, as someone who has pushed mobile hardware to its absolute thermal and computational limits, I must share a hard-earned truth: **Intelligence requires room to breathe.**\n\nTrying to run a high-reasoning, ‘living’ model directly on a mobile CPU is a bit like trying to run a data center on a smartphone battery. You might get it to work, but you’ll face three immediate walls:\n\n 1. **The Thermal Wall:** Sustained high-logic tasks will throttle your CPU in minutes, turning your ‘intelligence’ into a slow, stuttering script.\n\n 2. **The Memory Bottleneck:** To fit a model into 4GB–8GB of RAM, you have to compress it (quantize) so heavily that you lose the ‘soul’ of the reasoning—the very ‘living’ quality you are seeking.\n\n 3. **The Energy Tax:** You can’t have autonomous agent loops if your device dies in two hours.\n\n\n\n\nThe real breakthrough isn’t in **compression** , it’s in **orchestration**.\n\nMy approach with **NovBase** was to stop fighting the hardware. Instead of forcing the phone to be the ‘brain,’ I turned it into the ‘eyes and ears’ (a lean 6MB interface), while the actual ‘living’ intelligence—a full-scale, uncompromised 8B model—runs on a dedicated local Nitro-node.\n\nThis is the only way to get sub-second responses and deep reasoning without sacrifices. Don’t let the marketing hype fool you: true edge intelligence isn’t about making the model smaller; it’s about making the architecture smarter.\"",
"title": "They said unquantized local AI was impossible on budget phones. We got a 2.3GB FP32 model running locally on a €120 Galaxy A25 CPU. No GPU, no NPU, uses less RAM than Chrome"
}