{
"$type": "site.standard.document",
"bskyPostRef": {
"cid": "bafyreia67nwfqabrqiwtx7xfcdvmz5uosiibcik56zfhubspxu2tbiuc2e",
"uri": "at://did:plc:pgryn3ephfd2xgft23qokfzt/app.bsky.feed.post/3mjpuojslqwb2"
},
"path": "/t/building-local-my-2026-headless-ai-server-journey/175243#post_3",
"publishedAt": "2026-04-17T21:24:51.000Z",
"site": "https://discuss.huggingface.co",
"tags": [
"if you use MoE LLMs via GGUF on platforms like Ollama or LM Studio, they run smoothly even with just within 32GB of RAM (not VRAM)"
],
"textContent": "> What are you all currently running on your local setups?\n\nHmm… I use only small embedding models every day. I’ve integrated them into my work scripts. Since my GPU isn’t very powerful (a 3060 Ti with 8 GB of memory), I don’t really use very large models often locally…\n\nThat said, I’ve heard that if you use MoE LLMs via GGUF on platforms like Ollama or LM Studio, they run smoothly even with just within 32GB of RAM (not VRAM)…\n\nPersonally, since most of my current use cases don’t require confidentiality, I just use cloud services for my LLMs.\nOf course, I often try out models (LLM, T2I, etc.) hosted on HF via Spaces.",
"title": "Building Local: My 2026 Headless AI Server Journey"
}