External Publication
Visit Post

They said unquantized local AI was impossible on budget phones. We got a 2.3GB FP32 model running locally on a €120 Galaxy A25 CPU. No GPU, no NPU, uses less RAM than Chrome

Hugging Face Forums [Unofficial] May 8, 2026
Source

Interesting experiment with FP32, but 0.17 t/s is a computational dead end.

We solved the “edge logic” problem differently. Why force a mobile CPU to do 32-bit tensor math when you can use Local-First Orchestration?

My mobile node (ChatVTX) is only 6MB. It doesn’t heat the battery, it doesn’t crash, and it gives me full access to a 8B parameter model with sub-second latency via a direct Nitro-link.

Full project details, screenshots, and community discussion here (4PDA): [Link to your 4PDA thread]

(Note: The forum is in Russian, but the architecture and results speak for themselves. You can use a translator to check the technical logs).

Architecture beats brute force every time. While you’re celebrating 0.17 t/s, we are running full-scale OSINT swarms on the go.

▶ [OPEN] NovBase OSINT Report (Swarm Sync: 6522 chars) (click for more details)

Discussion in the ATmosphere

Loading comments...