{
  "$type": "site.standard.document",
  "bskyPostRef": {
    "cid": "bafyreidrllomtau2dsjffyyhwwuuh4wznl3wuvgkddhb75d2jdfpeljtie",
    "uri": "at://did:plc:pgryn3ephfd2xgft23qokfzt/app.bsky.feed.post/3mjpny4iqawh2"
  },
  "path": "/t/server-nexe-local-ai-server-with-rag-memory-multi-backend-inference-and-plugins/175348#post_1",
  "publishedAt": "2026-04-17T18:29:33.000Z",
  "site": "https://discuss.huggingface.co",
  "tags": [
    "https://github.com/jgoy-labs/server-nexe",
    "https://server-nexe.org"
  ],
  "textContent": "Hi everyone! I’d like to share **server-nexe** , an open-source, fully local AI server I’ve been building.\n\n**What it does** server-nexe is a FastAPI server that runs 100% on your machine. No cloud, no telemetry. It combines LLM inference with persistent RAG memory seamlessly.\n\n**Key features**\n\n  * **Multi-backend inference:** Switch backends with a single config parameter!\n\n    * **MLX** (Native Apple Silicon) → Loads models directly from  `mlx-community`.\n\n    * **llama.cpp (GGUF)** → Compatible with any GGUF model from the Hugging Face Hub.\n\n    * **Ollama** support included.\n\n  * **Persistent RAG memory:** Uses Qdrant + 768-dim embeddings across 3 specialized collections. The AI actually _remembers_ context between sessions.\n\n  * **Automatic Plugin System:** Easily extendable.\n\n  * **OpenAI-compatible API:** Acts as a drop-in replacement for your existing apps.\n\n  * **macOS Installer (DMG with wizard):** Automatically detects your hardware and recommends the best backend and models based on your RAM.\n\n\n\n\n**v1.0.0-beta is out!** This is the first public MVP, and I am very open to community feedback.\n\n**Why I built it** I wanted an AI server that actually remembered things across sessions. One question led to another, and my “learning by doing” approach got a bit out of hand! I’m not trying to compete with cloud services; I want to provide infrastructure for those who want to truly own their AI stack.\n\nDocumentation: 39 docs across 3 languages (CA/EN/ES) — structured to be read by both humans and AIs\n\nBuilt in Barcelona · Apache 2.0 · Feedback highly welcomed!\n\n**Links**\n\n  * **GitHub:** https://github.com/jgoy-labs/server-nexe\n\n  * **Docs:** https://server-nexe.org\n\n\n",
  "title": "Server-nexe: Local AI server with RAG memory, multi-backend inference, and plugins"
}