They said unquantized local AI was impossible on budget phones. We got a 2.3GB FP32 model running locally on a €120 Galaxy A25 CPU. No GPU, no NPU, uses less RAM than Chrome
That is a solid experiment! It actually brings back memories—about a year ago, I was running a similar setup with a full LLM deployed natively on the handset. I spent about 1 months optimizing it, and like you mentioned, it’s impressive to see it functioning in total isolation (Airplane mode).
However, after pushing that architecture to its limits—hitting 80-90% CPU sustained loads and dealing with the inevitable thermal throttling—I realized that for high-stakes OSINT and autonomous agent loops, I needed a different paradigm.
That’s why I eventually moved toward the Local-First Orchestration I’m using now. By offloading the heavy lifting to a dedicated Nitro-node and keeping the mobile interface at a lean 6MB, I get the best of both worlds: the raw power of an 8B model and sub-second latency, without turning the phone into a pocket heater.
Always good to see others experimenting with true edge independence, though. Keep pushing the boundaries!
Discussion in the ATmosphere