{
"$type": "site.standard.document",
"bskyPostRef": {
"cid": "bafyreihkhkonuxczmxoiybvbjg4v37z2dvzz6fy7rkiuthbqirekhsfuoq",
"uri": "at://did:plc:pgryn3ephfd2xgft23qokfzt/app.bsky.feed.post/3mmybqsh4pgv2"
},
"path": "/t/what-should-i-change-to-optimize-local-hosted-ai/176339#post_1",
"publishedAt": "2026-05-29T09:25:01.000Z",
"site": "https://discuss.huggingface.co",
"textContent": "I have a server with the following hardware:\n\nItel Ultra 7 270K Plus\n\n64gb RAM\n\n2x Intel ARC B70 32gb VRAM\n\nIm running Ubuntu server with llama.cpp.\n\nIm using it to do local Agentic coding with continue.dev plugin for VScode.\n\nMy startllm.sh file looks like this:\n\n\n #!/bin/bash\n source /opt/intel/oneapi/setvars.sh --force\n export ZES_ENABLE_SYSMAN=1\n\n cd ~/llama.cpp\n\n ZES_ENABLE_SYSMAN=1 ./build/bin/llama-server \\\n -m ~/models/Qwen3.6-27B-Q5_K_M.gguf \\\n -a Roboto \\\n -c 32768 \\\n --cache-type-k q8_0 \\\n --cache-type-v q8_0 \\\n --n-gpu-layers 999 \\\n -b 2048 \\\n -ub 512 \\\n --threads 24 \\\n --host 0.0.0.0 \\\n --port 8080 \\\n --split-mode layer \\\n --tensor-split 1,1 \\\n --numa distribute\n\n\n\nI still feel like its responding slow, which parameters should I change?",
"title": "What should i change to optimize local hosted AI"
}