They said unquantized local AI was impossible on budget phones. We got a 2.3GB FP32 model running locally on a €120 Galaxy A25 CPU. No GPU, no NPU, uses less RAM than Chrome
Engineering Insight: The Reality of On-Device Intelligence
"It is inspiring to see so many developers chasing the dream of a truly ‘living’ AI on a handheld device. However, as someone who has pushed mobile hardware to its absolute thermal and computational limits, I must share a hard-earned truth: Intelligence requires room to breathe.
Trying to run a high-reasoning, ‘living’ model directly on a mobile CPU is a bit like trying to run a data center on a smartphone battery. You might get it to work, but you’ll face three immediate walls:
The Thermal Wall: Sustained high-logic tasks will throttle your CPU in minutes, turning your ‘intelligence’ into a slow, stuttering script.
The Memory Bottleneck: To fit a model into 4GB–8GB of RAM, you have to compress it (quantize) so heavily that you lose the ‘soul’ of the reasoning—the very ‘living’ quality you are seeking.
The Energy Tax: You can’t have autonomous agent loops if your device dies in two hours.
The real breakthrough isn’t in compression , it’s in orchestration.
My approach with NovBase was to stop fighting the hardware. Instead of forcing the phone to be the ‘brain,’ I turned it into the ‘eyes and ears’ (a lean 6MB interface), while the actual ‘living’ intelligence—a full-scale, uncompromised 8B model—runs on a dedicated local Nitro-node.
This is the only way to get sub-second responses and deep reasoning without sacrifices. Don’t let the marketing hype fool you: true edge intelligence isn’t about making the model smaller; it’s about making the architecture smarter."
Discussion in the ATmosphere