{
  "$type": "site.standard.document",
  "bskyPostRef": {
    "cid": "bafyreif64gqgkmhir2q53u3ssktdrw3ylnirdtrwp3u3kwwkdsyc56g5mm",
    "uri": "at://did:plc:pgryn3ephfd2xgft23qokfzt/app.bsky.feed.post/3mldr6e3btf32"
  },
  "path": "/t/they-said-unquantized-local-ai-was-impossible-on-budget-phones-we-got-a-2-3gb-fp32-model-running-locally-on-a-120-galaxy-a25-cpu-no-gpu-no-npu-uses-less-ram-than-chrome/175739#post_4",
  "publishedAt": "2026-05-08T11:01:13.000Z",
  "site": "https://discuss.huggingface.co",
  "textContent": "Uzer-namo-2024:\n\n> NovBase OSINT Report (Swarm Sync: 6522 chars)\n\nHey Uzer-namo, I checked out your 4PDA thread.\n\nYour post explicitly states your model is: _“работающая на моем сервере с RTX 3050”_ (running on my server with an RTX 3050).\n\nYour 6MB app is a thin client/API wrapper sending network requests to a desktop GPU. That is cloud/remote hosting, not Edge AI. If you put your phone in Airplane Mode, your app stops working.\n\nMy video shows a 2.3GB uncompressed FP32 model running **locally, on-device, in Airplane Mode, using zero network** , processing exclusively on a budget mobile CPU. We are solving the physical memory bandwidth wall of mobile silicon without relying on external servers or quantization.\n\nThere is no Quantization that is the point, the model is 2.3GB running on the phone literally. Not wrapper of some other model or similar. The model and the APK are one not separated thing, when you install APK you get the model as well. Turn of internet or anything it still works. The slow part is because we are working with heavy python loop and unoptimized code. Its POC not the final product.\n\nAPI wrappers are great, but bridging a network ping to a desktop GPU is not the same sport as running raw floating-point math directly on mobile silicon.\n\nIts a nice project tho…I love it!",
  "title": "They said unquantized local AI was impossible on budget phones. We got a 2.3GB FP32 model running locally on a €120 Galaxy A25 CPU. No GPU, no NPU, uses less RAM than Chrome"
}