External Publication

What is the best architecture for integrating local LLM inference and RAG on mobile devices?

Hugging Face Forums [Unofficial] March 15, 2026

Hi everyone,

I’m currently exploring a mobile AI architecture and would love to hear technical opinions from others working in this area.

The goal is to support the following on a mobile app:

The technical directions I’m considering include:

My main questions are:

What is currently the most reliable architecture for local LLM + RAG on mobile?
If the frontend is Flutter, would you recommend Platform Channels or FFI?
What are good approaches for local knowledge retrieval on mobile devices?
How do you usually balance performance, memory usage, and model size in production or prototype setups?

I’d be very interested in hearing any real-world experience or recommendations related to mobile AI / edge AI systems.