{
"$type": "site.standard.document",
"bskyPostRef": {
"cid": "bafyreidrllomtau2dsjffyyhwwuuh4wznl3wuvgkddhb75d2jdfpeljtie",
"uri": "at://did:plc:pgryn3ephfd2xgft23qokfzt/app.bsky.feed.post/3mjpny4iqawh2"
},
"path": "/t/server-nexe-local-ai-server-with-rag-memory-multi-backend-inference-and-plugins/175348#post_1",
"publishedAt": "2026-04-17T18:29:33.000Z",
"site": "https://discuss.huggingface.co",
"tags": [
"https://github.com/jgoy-labs/server-nexe",
"https://server-nexe.org"
],
"textContent": "Hi everyone! I’d like to share **server-nexe** , an open-source, fully local AI server I’ve been building.\n\n**What it does** server-nexe is a FastAPI server that runs 100% on your machine. No cloud, no telemetry. It combines LLM inference with persistent RAG memory seamlessly.\n\n**Key features**\n\n * **Multi-backend inference:** Switch backends with a single config parameter!\n\n * **MLX** (Native Apple Silicon) → Loads models directly from `mlx-community`.\n\n * **llama.cpp (GGUF)** → Compatible with any GGUF model from the Hugging Face Hub.\n\n * **Ollama** support included.\n\n * **Persistent RAG memory:** Uses Qdrant + 768-dim embeddings across 3 specialized collections. The AI actually _remembers_ context between sessions.\n\n * **Automatic Plugin System:** Easily extendable.\n\n * **OpenAI-compatible API:** Acts as a drop-in replacement for your existing apps.\n\n * **macOS Installer (DMG with wizard):** Automatically detects your hardware and recommends the best backend and models based on your RAM.\n\n\n\n\n**v1.0.0-beta is out!** This is the first public MVP, and I am very open to community feedback.\n\n**Why I built it** I wanted an AI server that actually remembered things across sessions. One question led to another, and my “learning by doing” approach got a bit out of hand! I’m not trying to compete with cloud services; I want to provide infrastructure for those who want to truly own their AI stack.\n\nDocumentation: 39 docs across 3 languages (CA/EN/ES) — structured to be read by both humans and AIs\n\nBuilt in Barcelona · Apache 2.0 · Feedback highly welcomed!\n\n**Links**\n\n * **GitHub:** https://github.com/jgoy-labs/server-nexe\n\n * **Docs:** https://server-nexe.org\n\n\n",
"title": "Server-nexe: Local AI server with RAG memory, multi-backend inference, and plugins"
}