Practical match for 128Gb Strix Halo with 2x3090s? (inference for coding)
Hugging Face Forums [Unofficial]
May 19, 2026
So I rented a server with double 3090 and tried ro run some models. Picked a MoE one that gets offloaded and a dense one that does not.
Results (output tokens):
Qwen3.6-27B-Q8_0 (fits in 3090s):
Halo: 7.8 t/s
2x3090: 24 t/s
gpt-oss-120b-Q4_K_M (does not fit in 3090s, gets offloaded):
Halo: 56 t/s
2x3090: 8.8 t/s
Somehow this experiment did not make the choice clearer. I see people online posting way better results for gpt-oss on 2x3090s, maybe I didn’t know how to run it well.
I ran it with
root@vm6388:~# ./llama.cpp/build2/bin/llama-cli \
-m /root/gpt-oss-120b-Q4_K_M-00001-of-00002.gguf \
-c 128000 \
-fa on \
-ngl 23 \
-sm row \
-ts 1,1
Also since the rental was a VM I wasn’t able to see the mobo and memory channel count, just the CPU Xeon Gold 6246.
I have a feeling that I can replace the Halo with 2x 3090s with right tweaking. Am I right?
Discussion in the ATmosphere