{
"$type": "site.standard.document",
"bskyPostRef": {
"cid": "bafyreifv2yb2qm6iupuuta3g4jc6ni3q5sczgxfs7cijg3tvo6gq25cpni",
"uri": "at://did:plc:pgryn3ephfd2xgft23qokfzt/app.bsky.feed.post/3mndrflv6k2j2"
},
"path": "/t/mellum2-12b-a2-5b-instruct-q4-k-m-on-jetson-orin-nano-8gb/176480#post_1",
"publishedAt": "2026-06-02T23:36:21.000Z",
"site": "https://discuss.huggingface.co",
"textContent": "I tested Mellum2-12B-A2.5B-Instruct Q4_K_M on a Jetson Orin Nano 8GB.\n\nSystem:\n\n * Jetson Orin Nano 8GB\n\n * Ubuntu 22.04.5\n\n * 25W power mode\n\n * NVMe storage\n\n\n\n\nTesting performed:\n\n * Built current official llama.cpp successfully.\n\n * Built the Mellum2 branch of llama.cpp successfully.\n\n * Downloaded the community GGUF successfully.\n\n * Verified model download and local file loading.\n\n\n\n\nResults:\n\n * CUDA-enabled Mellum2 build consistently failed with CUDA allocation errors during model loading.\n\n * CPU-only Mellum2 build loaded successfully.\n\n * CPU-only inference technically worked, but generation speed was extremely slow and not practical for real-world use.\n\n * No usable configuration was found on my Jetson Orin Nano 8GB during testing.\n\n\n\n\nThis appears to be neither a download issue nor a GGUF corruption issue. The model can load in CPU-only mode, but I was unable to achieve practical GPU-accelerated inference on this hardware.",
"title": "Mellum2-12B-A2.5B-Instruct Q4_K_M on Jetson Orin Nano 8GB"
}