Mellum2-12B-A2.5B-Instruct Q4_K_M on Jetson Orin Nano 8GB

Hugging Face Forums [Unofficial] June 2, 2026

Source

I tested Mellum2-12B-A2.5B-Instruct Q4_K_M on a Jetson Orin Nano 8GB.

System:

Jetson Orin Nano 8GB
Ubuntu 22.04.5
25W power mode
NVMe storage

Testing performed:

Built current official llama.cpp successfully.
Built the Mellum2 branch of llama.cpp successfully.
Downloaded the community GGUF successfully.
Verified model download and local file loading.

Results:

CUDA-enabled Mellum2 build consistently failed with CUDA allocation errors during model loading.
CPU-only Mellum2 build loaded successfully.
CPU-only inference technically worked, but generation speed was extremely slow and not practical for real-world use.
No usable configuration was found on my Jetson Orin Nano 8GB during testing.

This appears to be neither a download issue nor a GGUF corruption issue. The model can load in CPU-only mode, but I was unable to achieve practical GPU-accelerated inference on this hardware.

Discussion in the ATmosphere