External Publication
Visit Post

What should i change to optimize local hosted AI

Hugging Face Forums [Unofficial] May 29, 2026
Source

I have a server with the following hardware:

Itel Ultra 7 270K Plus

64gb RAM

2x Intel ARC B70 32gb VRAM

Im running Ubuntu server with llama.cpp.

Im using it to do local Agentic coding with continue.dev plugin for VScode.

My startllm.sh file looks like this:

#!/bin/bash
source /opt/intel/oneapi/setvars.sh --force
export ZES_ENABLE_SYSMAN=1

cd ~/llama.cpp

ZES_ENABLE_SYSMAN=1 ./build/bin/llama-server \
    -m ~/models/Qwen3.6-27B-Q5_K_M.gguf \
    -a Roboto \
    -c 32768 \
    --cache-type-k q8_0 \
    --cache-type-v q8_0 \
    --n-gpu-layers 999 \
    -b 2048 \
    -ub 512 \
    --threads 24 \
    --host 0.0.0.0 \
    --port 8080 \
    --split-mode layer \
    --tensor-split 1,1 \
    --numa distribute

I still feel like its responding slow, which parameters should I change?

Discussion in the ATmosphere

Loading comments...