Raw Record Source

{
  "$type": "site.standard.document",
  "bskyPostRef": {
    "cid": "bafyreiejqpk5l4pvlix2okyslnqhardyr6ghchnr754r4o7ew3cq3uxtwu",
    "uri": "at://did:plc:pgryn3ephfd2xgft23qokfzt/app.bsky.feed.post/3mjvdcahvtki2"
  },
  "path": "/t/failed-to-load-model/175391#post_3",
  "publishedAt": "2026-04-19T23:55:57.000Z",
  "site": "https://discuss.huggingface.co",
  "tags": [
    "GitHub",
    "Hugging Face",
    "blog.google",
    "LM Studio",
    "Qiita"
  ],
  "textContent": "If this is a Gemma 4 family device, it could be explained by the fact that Gemma 4 was released only recently and software support isn’t yet fully established; however, if that’s not the case, the issue is a different matter.\n\nIn any case, the simplest solution is to update the software (if that doesn’t resolve the issue, you’ll need to look for another solution or wait):\n\n* * *\n\nFor the **Gemma 4 JANG MLX model** , the most likely reason it will not load is this:\n\n**the app/runtime that is reading the model does not agree with the model file format about how one of the weight tensors is stored.** In other words, this is most likely a **compatibility problem** , not a “your Mac is too weak” problem. The strongest clue is the exact error itself, and there is already a public LM Studio bug report showing the **same shape mismatch** on the **same parameter** while loading `gemma-4-31b-jang_4m-crack` on Mac with LM Studio MLX. (GitHub)\n\n## What the error is saying\n\nThis part matters most:\n\n`Expected shape (8192, 672) but received shape (8192, 1344)`\n\nA model is a collection of tensors. Each tensor has a fixed shape. When the loader reaches `language_model.model.layers.0.self_attn.q_proj.weight`, it expects one layout, but the checkpoint contains another. Because the mismatch is **exact and structural** , the loader stops immediately. This kind of failure happens **before inference even begins** , which is why it points to a format/layout mismatch rather than a normal runtime slowdown or memory pressure issue. (GitHub)\n\nThe fact that `1344` is exactly **2×** `672` is also a strong clue. That pattern usually suggests the tensor is being interpreted with the **wrong packing/layout assumption** rather than being randomly corrupted. I cannot prove the exact internal packing rule from the error alone, but the clean 2× difference strongly suggests “loader and checkpoint disagree on representation,” not “file is slightly damaged.” The recent `mlx-lm` release notes make that interpretation more plausible because they include a Gemma 4–specific fix for **quantized per-layer projection loading** , and your error is on a projection tensor (`q_proj.weight`). (GitHub)\n\n## Why your case is especially prone to this\n\nYour model is not just a plain stock MLX conversion. The model card identifies it as **JANG v2 (MLX-native safetensors)** , with **actual average 5.1 bits** , a **dense hybrid sliding/global attention** Gemma 4 architecture, and a recommendation to use **vMLX** for the best experience. The same page presents vMLX as the recommended path and says standard `mlx_lm` / `mlx_vlm` do not support this setup at the versions it lists. (Hugging Face)\n\nA related conversion page for the GGUF version explains the compatibility point even more directly: the original model uses **JANG v2 mixed-precision MLX quantization** and says standard tools such as **LM Studio, llama.cpp, oMLX, and mlx-lm** cannot load that original format because of its **mixed per-layer bit widths**. That lines up very well with your shape-mismatch error. (Hugging Face)\n\nSo the likely story is:\n\n  1. the runtime knows enough to recognize the model as Gemma 4,\n  2. it starts building the attention layer,\n  3. then it reaches a quantized projection tensor whose stored layout matches **JANG/vMLX expectations** ,\n  4. but the loader is assuming a different MLX layout,\n  5. and load fails with the shape mismatch. (GitHub)\n\n\n\n## Why this is probably **not** a memory problem\n\nIf this were mostly a RAM or unified-memory issue, the usual symptoms would be different: out-of-memory messages, Metal allocation failures, crashing later in loading, or trouble once generation starts. Your failure happens earlier and more cleanly: the loader names a specific tensor and says its shape is wrong. That is a **schema/format mismatch** type of error. Also, a recent user writeup shows that at least some official Gemma 4 MLX models can run in LM Studio on a **32 GB Mac** after runtime updates, which further suggests that the main blocker here is not simply Mac memory. (GitHub)\n\n## Why Gemma 4 makes this easier to trip over\n\nGemma 4 is a newer and more specialized model family than older plain text-only local models. Google describes Gemma 4 as a four-size family built for reasoning and agentic workflows, and Gemma 4 support only landed recently in the MLX ecosystem. The `mlx-lm` release notes show that Gemma 4 support was added recently and immediately followed by Gemma 4–specific fixes, including the projection-loading fix mentioned earlier. That is the pattern you see when support is still stabilizing: some models load, some do not, and custom formats are more likely to break first. (blog.google)\n\nLM Studio’s timeline shows the same thing. On **April 3, 2026** , users were still hitting `Model type gemma4 not supported` with `No module named 'mlx_vlm.models.gemma4'`. On **April 13, 2026** , there was still a public issue showing `Gemma 4 support is not ready yet`. LM Studio’s changelog then shows Gemma 4-related updates on **April 2** , **April 9** , and **April 10** —but those entries are about tool-call reliability and the updated Gemma 4 chat template, not a fix for this exact JANG tensor-layout mismatch. (GitHub)\n\nThat context matters because it means your experience is not strange. It fits a broader pattern: **Gemma 4 support on Mac/MLX was moving quickly, and custom JANG MLX checkpoints sit near the edge of compatibility.** (GitHub)\n\n## What I think is happening in your case\n\nMy best explanation is this:\n\n**The model file is probably okay, but your current loader path is not the right one for this checkpoint format.**\n\nMore specifically, I think you are trying to load a **JANG v2 mixed-precision MLX checkpoint** in a runtime path that can handle **some Gemma 4 MLX models** , but not this particular weight layout. That interpretation is strongly supported by:\n\n  * the exact same public bug report for the same family of model, (GitHub)\n  * the model card’s vMLX-first guidance, (Hugging Face)\n  * the GGUF conversion page explicitly saying the original JANG format is not for standard tools, (Hugging Face)\n  * and the recent Gemma 4 projection-loading fixes in `mlx-lm`. (GitHub)\n\n\n\n## What it is **less likely** to be\n\nIt is **less likely** that:\n\n  * your Mac is simply too weak, because the error is structural rather than resource-related, (GitHub)\n  * the model is randomly corrupted, because the mismatch is clean and reproducible rather than chaotic, and the same error exists publicly on another machine, (GitHub)\n  * or your prompt/settings are wrong, because those matter **after** loading, not during tensor-shape validation. (GitHub)\n\n\n\nA mixed or stale local snapshot is still possible, though. If your local cache combines `config.json`, shard files, or index files from different revisions, that can also create “expected A, got B” errors. It is not my top guess here, but it is worth cleaning up because it is easy to test. (GitHub)\n\n## The safest way to fix it\n\n### 1. If you are in LM Studio, update both the app and the runtimes\n\nIn LM Studio, check **Settings → Runtime** and update **LM Studio MLX** and **Metal llama.cpp**. The changelog shows Gemma 4-related updates in early April, and a recent user writeup says updating the runtime was what allowed an official MLX Gemma 4 model to run in LM Studio. (LM Studio)\n\nThis is worth doing even though it may not fully solve the JANG model, because older LM Studio builds had explicit Gemma 4 support gaps. (GitHub)\n\n### 2. Test a **standard** Gemma 4 MLX model\n\nThis is the most informative next step.\n\nTry a more standard Gemma 4 MLX model, such as `mlx-community/gemma-4-26b-a4b-it-4bit`, which a recent writeup says now runs in LM Studio after runtime updates. (Qiita)\n\nThis gives you a clean diagnostic split:\n\n  * **If a standard Gemma 4 MLX model loads, but your JANG model fails** , then your problem is almost certainly **checkpoint-format compatibility**. (Qiita)\n  * **If standard Gemma 4 MLX models also fail** , then your problem is broader: app/runtime versions, environment mismatch, or incomplete Gemma 4 support on your current stack. (GitHub)\n\n\n\n### 3. For this JANG model, use the runtime it was built around: **vMLX**\n\nThis is the most likely actual solution for **this** model family.\n\nThe model page explicitly recommends **vMLX** , and the GGUF conversion page says the original JANG mixed-precision MLX format is only compatible with vMLX while standard tools cannot load it. That makes vMLX the natural first choice for the original JANG checkpoint. (Hugging Face)\n\n### 4. If you want to stay in LM Studio, use the **GGUF** conversion instead of the original JANG MLX checkpoint\n\nThe GGUF conversion exists specifically because the original JANG MLX format is not broadly compatible. The conversion page says it provides standard GGUF quantizations for use with **llama.cpp, LM Studio, Ollama, and other GGUF-compatible engines**. So if LM Studio is your preferred app, the GGUF path is likely the smoother path than trying to force the original JANG MLX checkpoint to work there. (Hugging Face)\n\n### 5. Delete the local model folder and re-download it cleanly\n\nThis is a good hygiene step.\n\nIf your local copy is stale or mixed, redownloading fixes that. It may not be the root cause, but it is easy to rule out and worth doing before deeper debugging. The model repo has had recent updates, including README and capability metadata changes, so a clean snapshot is safer than relying on an older local cache. (Hugging Face)\n\n### 6. If you are loading from Python, update MLX packages\n\nIf you are not in LM Studio and instead use Python directly, make sure `mlx-lm` is recent enough to include **Gemma 4 support** and the **Gemma 4 quantized per-layer projection loading** fix. Those fixes are in the release notes, so older installations are a real risk. (GitHub)\n\n## A simple decision tree\n\nHere is the beginner-safe version:\n\n**Case A: official MLX Gemma 4 works, JANG MLX fails**\nThat means your app can handle Gemma 4 in general, but **not this custom checkpoint format**. Use **vMLX** for the JANG model, or use the **GGUF** version in LM Studio. (Qiita)\n\n**Case B: official MLX Gemma 4 also fails**\nThat means your Gemma 4 support is still not correct at the runtime/app level. Update LM Studio + runtimes, or update your MLX Python packages. (GitHub)\n\n**Case C: everything still fails after updating**\nThen do a clean re-download of the model files and retest. If the JANG model still fails but official models load, the answer is still “wrong runtime for this checkpoint.” (Hugging Face)\n\n## My bottom line\n\nThe clearest explanation is:\n\n**Your MLX model will not load because the checkpoint format and the loader are mismatched, and this is especially likely because you are using a JANG v2 mixed-precision Gemma 4 checkpoint that is meant for vMLX rather than a standard MLX loader path.** (GitHub)\n\nSo the practical fix order is:\n\n  1. update LM Studio and runtimes if you use LM Studio, (LM Studio)\n  2. test a standard `mlx-community` Gemma 4 MLX model, (Qiita)\n  3. use **vMLX** for the original JANG MLX checkpoint, (Hugging Face)\n  4. or use the **GGUF** conversion if you want LM Studio compatibility, (Hugging Face)\n  5. and re-download the model cleanly to rule out cache issues. (Hugging Face)\n\n",
  "title": "Failed to load model"
}