{
  "$type": "site.standard.document",
  "bskyPostRef": {
    "cid": "bafyreifv2yb2qm6iupuuta3g4jc6ni3q5sczgxfs7cijg3tvo6gq25cpni",
    "uri": "at://did:plc:pgryn3ephfd2xgft23qokfzt/app.bsky.feed.post/3mndrflv6k2j2"
  },
  "path": "/t/mellum2-12b-a2-5b-instruct-q4-k-m-on-jetson-orin-nano-8gb/176480#post_1",
  "publishedAt": "2026-06-02T23:36:21.000Z",
  "site": "https://discuss.huggingface.co",
  "textContent": "I tested Mellum2-12B-A2.5B-Instruct Q4_K_M on a Jetson Orin Nano 8GB.\n\nSystem:\n\n  * Jetson Orin Nano 8GB\n\n  * Ubuntu 22.04.5\n\n  * 25W power mode\n\n  * NVMe storage\n\n\n\n\nTesting performed:\n\n  * Built current official llama.cpp successfully.\n\n  * Built the Mellum2 branch of llama.cpp successfully.\n\n  * Downloaded the community GGUF successfully.\n\n  * Verified model download and local file loading.\n\n\n\n\nResults:\n\n  * CUDA-enabled Mellum2 build consistently failed with CUDA allocation errors during model loading.\n\n  * CPU-only Mellum2 build loaded successfully.\n\n  * CPU-only inference technically worked, but generation speed was extremely slow and not practical for real-world use.\n\n  * No usable configuration was found on my Jetson Orin Nano 8GB during testing.\n\n\n\n\nThis appears to be neither a download issue nor a GGUF corruption issue. The model can load in CPU-only mode, but I was unable to achieve practical GPU-accelerated inference on this hardware.",
  "title": "Mellum2-12B-A2.5B-Instruct Q4_K_M on Jetson Orin Nano 8GB"
}