External Publication
Visit Post

Mellum2-12B-A2.5B-Instruct Q4_K_M on Jetson Orin Nano 8GB

Hugging Face Forums [Unofficial] June 2, 2026
Source

I tested Mellum2-12B-A2.5B-Instruct Q4_K_M on a Jetson Orin Nano 8GB.

System:

  • Jetson Orin Nano 8GB

  • Ubuntu 22.04.5

  • 25W power mode

  • NVMe storage

Testing performed:

  • Built current official llama.cpp successfully.

  • Built the Mellum2 branch of llama.cpp successfully.

  • Downloaded the community GGUF successfully.

  • Verified model download and local file loading.

Results:

  • CUDA-enabled Mellum2 build consistently failed with CUDA allocation errors during model loading.

  • CPU-only Mellum2 build loaded successfully.

  • CPU-only inference technically worked, but generation speed was extremely slow and not practical for real-world use.

  • No usable configuration was found on my Jetson Orin Nano 8GB during testing.

This appears to be neither a download issue nor a GGUF corruption issue. The model can load in CPU-only mode, but I was unable to achieve practical GPU-accelerated inference on this hardware.

Discussion in the ATmosphere

Loading comments...