They said unquantized local AI was impossible on budget phones. We got a 2.3GB FP32 model running locally on a €120 Galaxy A25 CPU. No GPU, no NPU, uses less RAM than Chrome
I completely agree that standard dense transformers hit a Thermal Wall and an Energy Tax, that is exactly why we had to throw away the vanilla transformer architecture and build the Synthetic Neural Engine.
And you are absolutely right to fit a standard model into RAM, you have to quantize it to death. But that’s exactly what my post addresses: we didn’t quantize. The video proves the model is running in pure, uncompressed FP32, pulling only ~4.4GB RAM total, with the CPU sitting at a cool 33.3°C and zero battery drain.
We bypassed the thermal and memory walls not by offloading the compute to a desktop GPU over Wi-Fi, but by changing the fundamental math of the matrix multiplications so the mobile CPU only computes the pure signal.
Client-server orchestration (‘Nitro-nodes’) is a great workaround for standard models cool! But our goal isn’t to work around the mobile hardware, it’s to write better math so the mobile hardware can actually do the thinking itself.
Discussion in the ATmosphere