External Publication

They said unquantized local AI was impossible on budget phones. We got a 2.3GB FP32 model running locally on a €120 Galaxy A25 CPU. No GPU, no NPU, uses less RAM than Chrome

Hugging Face Forums [Unofficial] May 8, 2026

That is a solid experiment! It actually brings back memories—about a year ago, I was running a similar setup with a full LLM deployed natively on the handset. I spent about 1 months optimizing it, and like you mentioned, it’s impressive to see it functioning in total isolation (Airplane mode).

However, after pushing that architecture to its limits—hitting 80-90% CPU sustained loads and dealing with the inevitable thermal throttling—I realized that for high-stakes OSINT and autonomous agent loops, I needed a different paradigm.

That’s why I eventually moved toward the Local-First Orchestration I’m using now. By offloading the heavy lifting to a dedicated Nitro-node and keeping the mobile interface at a lean 6MB, I get the best of both worlds: the raw power of an 8B model and sub-second latency, without turning the phone into a pocket heater.

Always good to see others experimenting with true edge independence, though. Keep pushing the boundaries!

Discussion in the ATmosphere