They said unquantized local AI was impossible on budget phones. We got a 2.3GB FP32 model running locally on a €120 Galaxy A25 CPU. No GPU, no NPU, uses less RAM than Chrome
"That is a very ambitious and impressive approach! It’s rare to see someone tackling the fundamental math of matrix multiplications to bypass the thermal wall on mobile hardware.
We have been working on a similar challenge with our project, but from a different architectural angle. Instead of pushing a heavy model to its limits, we’ve shifted toward a ‘Lean Core + Knowledge Swarm’ architecture. Our current setup uses a lightweight model (around 400MB) acting as an intelligent orchestrator, backed by a robust, pre-tuned framework that handles deep data extraction and synthesis from external sources in real-time.
This way, we keep the mobile CPU cool while maintaining high-level intelligence through efficient ‘Nitro-node’ logic rather than raw compute. It would be fascinating to compare notes on how your ‘Signal Math’ handles long-context reasoning compared to our ‘Swarm Search’ retrieval.
If you’d like to discuss these architectures or exchange ideas on mobile AI optimization, feel free to reach out here: @obn777bot (Telegram).
Keep pushing the boundaries — the world needs more ‘out-of-the-box’ thinking!"
Discussion in the ATmosphere