{
  "$type": "site.standard.document",
  "bskyPostRef": {
    "cid": "bafyreicpc7xasfoooszpjjh56bxy4pch2tnl6t2wisy46c4dk452wlqhbe",
    "uri": "at://did:plc:pgryn3ephfd2xgft23qokfzt/app.bsky.feed.post/3mlezghz4nli2"
  },
  "path": "/t/they-said-unquantized-local-ai-was-impossible-on-budget-phones-we-got-a-2-3gb-fp32-model-running-locally-on-a-120-galaxy-a25-cpu-no-gpu-no-npu-uses-less-ram-than-chrome/175739#post_9",
  "publishedAt": "2026-05-08T19:21:54.000Z",
  "site": "https://discuss.huggingface.co",
  "tags": [
    "@obn777bot"
  ],
  "textContent": "\"That is a very ambitious and impressive approach! It’s rare to see someone tackling the fundamental math of matrix multiplications to bypass the thermal wall on mobile hardware.\n\nWe have been working on a similar challenge with our project, but from a different architectural angle. Instead of pushing a heavy model to its limits, we’ve shifted toward a **‘Lean Core + Knowledge Swarm’** architecture. Our current setup uses a lightweight model (around 400MB) acting as an intelligent orchestrator, backed by a robust, pre-tuned framework that handles deep data extraction and synthesis from external sources in real-time.\n\nThis way, we keep the mobile CPU cool while maintaining high-level intelligence through efficient ‘Nitro-node’ logic rather than raw compute. It would be fascinating to compare notes on how your ‘Signal Math’ handles long-context reasoning compared to our ‘Swarm Search’ retrieval.\n\nIf you’d like to discuss these architectures or exchange ideas on mobile AI optimization, feel free to reach out here: **@obn777bot** (Telegram).\n\nKeep pushing the boundaries — the world needs more ‘out-of-the-box’ thinking!\"",
  "title": "They said unquantized local AI was impossible on budget phones. We got a 2.3GB FP32 model running locally on a €120 Galaxy A25 CPU. No GPU, no NPU, uses less RAM than Chrome"
}