What should i change to optimize local hosted AI
Hugging Face Forums [Unofficial]
May 29, 2026
I have a server with the following hardware:
Itel Ultra 7 270K Plus
64gb RAM
2x Intel ARC B70 32gb VRAM
Im running Ubuntu server with llama.cpp.
Im using it to do local Agentic coding with continue.dev plugin for VScode.
My startllm.sh file looks like this:
#!/bin/bash
source /opt/intel/oneapi/setvars.sh --force
export ZES_ENABLE_SYSMAN=1
cd ~/llama.cpp
ZES_ENABLE_SYSMAN=1 ./build/bin/llama-server \
-m ~/models/Qwen3.6-27B-Q5_K_M.gguf \
-a Roboto \
-c 32768 \
--cache-type-k q8_0 \
--cache-type-v q8_0 \
--n-gpu-layers 999 \
-b 2048 \
-ub 512 \
--threads 24 \
--host 0.0.0.0 \
--port 8080 \
--split-mode layer \
--tensor-split 1,1 \
--numa distribute
I still feel like its responding slow, which parameters should I change?
Discussion in the ATmosphere