{
  "$type": "site.standard.document",
  "bskyPostRef": {
    "cid": "bafyreihkhkonuxczmxoiybvbjg4v37z2dvzz6fy7rkiuthbqirekhsfuoq",
    "uri": "at://did:plc:pgryn3ephfd2xgft23qokfzt/app.bsky.feed.post/3mmybqsh4pgv2"
  },
  "path": "/t/what-should-i-change-to-optimize-local-hosted-ai/176339#post_1",
  "publishedAt": "2026-05-29T09:25:01.000Z",
  "site": "https://discuss.huggingface.co",
  "textContent": "I have a server with the following hardware:\n\nItel Ultra 7 270K Plus\n\n64gb RAM\n\n2x Intel ARC B70 32gb VRAM\n\nIm running Ubuntu server with llama.cpp.\n\nIm using it to do local Agentic coding with continue.dev plugin for VScode.\n\nMy startllm.sh file looks like this:\n\n\n    #!/bin/bash\n    source /opt/intel/oneapi/setvars.sh --force\n    export ZES_ENABLE_SYSMAN=1\n\n    cd ~/llama.cpp\n\n    ZES_ENABLE_SYSMAN=1 ./build/bin/llama-server \\\n        -m ~/models/Qwen3.6-27B-Q5_K_M.gguf \\\n        -a Roboto \\\n        -c 32768 \\\n        --cache-type-k q8_0 \\\n        --cache-type-v q8_0 \\\n        --n-gpu-layers 999 \\\n        -b 2048 \\\n        -ub 512 \\\n        --threads 24 \\\n        --host 0.0.0.0 \\\n        --port 8080 \\\n        --split-mode layer \\\n        --tensor-split 1,1 \\\n        --numa distribute\n\n\n\nI still feel like its responding slow, which parameters should I change?",
  "title": "What should i change to optimize local hosted AI"
}