Mellum2-12B-A2.5B-Instruct Q4_K_M on Jetson Orin Nano 8GB
I tested Mellum2-12B-A2.5B-Instruct Q4_K_M on a Jetson Orin Nano 8GB.
System:
Jetson Orin Nano 8GB
Ubuntu 22.04.5
25W power mode
NVMe storage
Testing performed:
Built current official llama.cpp successfully.
Built the Mellum2 branch of llama.cpp successfully.
Downloaded the community GGUF successfully.
Verified model download and local file loading.
Results:
CUDA-enabled Mellum2 build consistently failed with CUDA allocation errors during model loading.
CPU-only Mellum2 build loaded successfully.
CPU-only inference technically worked, but generation speed was extremely slow and not practical for real-world use.
No usable configuration was found on my Jetson Orin Nano 8GB during testing.
This appears to be neither a download issue nor a GGUF corruption issue. The model can load in CPU-only mode, but I was unable to achieve practical GPU-accelerated inference on this hardware.
Discussion in the ATmosphere