{
"$type": "site.standard.document",
"bskyPostRef": {
"cid": "bafyreib5wpsqwlqyesvbqdmh64hfr6awjm5vfatyxxhk7hqh3j3pmbmu24",
"uri": "at://did:plc:pgryn3ephfd2xgft23qokfzt/app.bsky.feed.post/3mjxnct7vzyz2"
},
"path": "/t/asahi-linux-on-an-m2-studio/175412#post_2",
"publishedAt": "2026-04-20T23:29:58.000Z",
"site": "https://discuss.huggingface.co",
"tags": [
"asahilinux.org",
"github.com",
"docs.ollama.com",
"docs.vllm.ai"
],
"textContent": "Depending on the settings, it looks like Ollama itself might actually work?\n\n* * *\n\nYour M2 Studio running Asahi Linux is in a **real but awkward middle ground** : the OS and GPU stack are now good enough that GPU-accelerated inference is possible, but the most popular Apple-Silicon inference ecosystem is still much more mature on **macOS + Metal** than on **Linux + Vulkan**. That is the core reason Ollama can run on your machine yet still fall back to CPU. (asahilinux.org)\n\n## The big picture\n\nThe old answer used to be “Apple GPUs on Linux are not ready.” That answer is now outdated. Fedora Asahi Remix supports **Mac Studio** hardware and ships **OpenCL 3.0** and **Vulkan 1.4** on Apple Silicon. So your machine is no longer blocked at the operating-system level. The bottleneck is now the **runtime/backend layer** : which inference engine actually knows how to use that Vulkan path well. (asahilinux.org)\n\nThat distinction matters because many people mix up three different things:\n\n 1. **Can Asahi access the GPU at all?**\n 2. **Can this inference engine use Vulkan on Linux?**\n 3. **Can this specific model be placed on GPU reliably on this backend?**\n\n\n\nOn your machine, the answer to the first one is now basically yes. The second and third are where the real trouble starts. (asahilinux.org)\n\n## Which inference engine family fits Asahi Linux best\n\nFor **Asahi Linux specifically** , the best fit today is the **`llama.cpp` / GGML / Vulkan** family. `llama.cpp` has official Vulkan build instructions for Linux, device inspection, and GPU offload controls. That makes it the best “reference engine” for your platform. (github.com)\n\n**Ollama** is compatible only in a more limited sense. Its docs say Apple GPUs are accelerated through **Metal** , while **Vulkan support on Linux/Windows is experimental** and must be enabled with `OLLAMA_VULKAN=1` for the Ollama server. So on Asahi Linux, Ollama is not using the comfortable Apple-native path; it is using a newer, rougher Linux Vulkan path. (docs.ollama.com)\n\nBy contrast, things like **vLLM Metal** are built for **Apple Silicon Macs using MLX** , and their install docs explicitly require **macOS on Apple Silicon**. That makes them interesting for Apple hardware in general, but they are **not** the right answer for Apple GPU inference on Asahi Linux. (docs.vllm.ai)\n\n## So why is Ollama using CPU?\n\nThere are a few likely causes.\n\n### 1. Ollama’s Vulkan path may not actually be enabled for the running server\n\nOn Linux, Ollama is often started as a **systemd service**. Its FAQ says environment variables must be set with `systemctl edit ollama.service`, then `daemon-reload` and `restart`. So if `OLLAMA_VULKAN=1` was only exported in your terminal, the actual service may still be running without Vulkan. (docs.ollama.com)\n\n### 2. Vulkan can be available, but the model may still not land on GPU\n\nThis is not just theory. There is a public Ollama issue showing **Vulkan specified** while the model still **does not load to GPU**. So “Vulkan exists on the machine” and “Ollama really placed this model on GPU” are two different things. (github.com)\n\n### 3. You are using the right app family, but the wrong validation order\n\nOn your machine, **Ollama should not be the first proof that GPU inference works**. `llama.cpp` should. If `llama.cpp` with Vulkan can see the GPU and offload layers, then the platform is basically working and the remaining problem is Ollama’s wrapper/integration behavior. If `llama.cpp` cannot do it, no higher-level wrapper is going to save you. (github.com)\n\n### 4. Even when it works, Linux Vulkan is currently behind macOS Metal\n\nThere is an upstream `llama.cpp` issue opened by one of the **Asahi GPU driver developers** that directly compares macOS Metal and Linux Vulkan on M2-class hardware and says **macOS is significantly faster** in their test. That means “GPU is working, but this still feels worse than expected” is a completely believable outcome on Asahi today. (github.com)\n\n### 5. Some model/backend combinations are still unstable\n\nThere are also recent Vulkan-side bug reports in `llama.cpp` where a model produced **garbage outputs** under the Vulkan backend on an Apple M2 Pro running Fedora Asahi. So even a technically working GPU path may still have correctness or stability problems depending on the model and backend revision. (github.com)\n\n## My recommendation for your exact case\n\nI would treat your case as a **three-layer diagnosis** :\n\n 1. **Prove the OS Vulkan stack is healthy**\n 2. **Prove raw`llama.cpp` Vulkan offload works**\n 3. **Only then try to make Ollama behave**\n\n\n\nThat order matters because it prevents you from debugging model packaging, wrapper behavior, and GPU backend problems all at the same time. (github.com)\n\n## Step 1: Check that the host Vulkan stack is alive\n\nBefore doing anything else, make sure the system itself sees Vulkan correctly.\n\nRun:\n\n\n vulkaninfo | head\n\n\n`llama.cpp`’s Vulkan build docs explicitly tell you to verify Vulkan before building and testing. If `vulkaninfo` does not work, the problem is not Ollama and not the model. It is lower in the system stack. (github.com)\n\n## Step 2: Use `llama.cpp` as your reference engine\n\nBuild `llama.cpp` with Vulkan support:\n\n\n cmake -B build -DGGML_VULKAN=1\n cmake --build build --config Release\n\n\nThen test what devices it can see and try a simple run:\n\n\n ./build/bin/llama-cli --list-devices\n ./build/bin/llama-cli -m /path/to/model.gguf -p \"Hello\" -ngl 99\n\n\nThose options are straight from the official build/runtime docs. `--list-devices` shows what `llama.cpp` can use, and `-ngl 99` is the standard “offload as much as possible” test. (github.com)\n\nIf this works, then your Asahi system **can** do GPU inference. At that point, CPU fallback in Ollama becomes a wrapper/runtime problem, not proof that your platform is incompatible. (github.com)\n\n## Step 3: Configure Ollama the Linux way\n\nIf `llama.cpp` works, then set Vulkan for the actual Ollama service:\n\n\n sudo systemctl edit ollama.service\n\n\nAdd:\n\n\n [Service]\n Environment=\"OLLAMA_VULKAN=1\"\n\n\nThen reload and restart:\n\n\n sudo systemctl daemon-reload\n sudo systemctl restart ollama\n\n\nThat is the method Ollama’s FAQ documents for Linux service installs. (docs.ollama.com)\n\nThen check model placement:\n\n\n ollama ps\n\n\nOllama’s FAQ says the `PROCESSOR` column will tell you whether the model is on **GPU** , **CPU** , or split across them. That is the right way to verify progress. (docs.ollama.com)\n\n## Which engines make sense for you\n\n### Best current match: `llama.cpp`\n\nThis is the most compatible engine family for Asahi Linux because it has a real Linux Vulkan path and it is the best way to validate the platform directly. (github.com)\n\n### Secondary option: Ollama\n\nThis is worth using after the raw Vulkan path is proven. It is convenient, but on your machine it rides on top of an **experimental Vulkan backend** , so it is not the right first diagnostic tool. (docs.ollama.com)\n\n### Later option: `llama.cpp` server or container-based serving\n\nThis can work, but only after native CLI inference works. There are public Asahi-side attempts to run `server-vulkan` in containers and get stuck on GPU visibility/device exposure, so containers are not a beginner-friendly first step here. (github.com)\n\n### Poor fit on Asahi Linux: MLX / vLLM Metal style stacks\n\nThese are tied to **macOS Apple-Silicon compute paths** , not the Vulkan path you need on Linux. (docs.vllm.ai)\n\n## What to expect in practice\n\nThe good news is that GPU inference on Asahi is **real** now.\n\nThe bad news is that it is not yet the nicest way to use Apple Silicon for local LLMs. The public benchmark discussion from the Asahi side shows the Vulkan path working, but still trailing macOS Metal noticeably. So your realistic target today is **working GPU offload and usable inference** , not “best possible Apple-Silicon experience.” (github.com)\n\nThat means your machine is best thought of as:\n\n * **capable**\n * **Vulkan-first**\n * **experimental around wrappers**\n * **not yet as polished as macOS for this workload** (asahilinux.org)\n\n\n\n## What I think is most likely true in your case\n\nBased on what you described, my strongest guess is this:\n\n * your **Asahi system is probably capable of Vulkan GPU inference** ,\n * **Ollama is either not running with Vulkan enabled at the service level** , or\n * **its experimental Vulkan path is not successfully placing the model on GPU** , even though the machine could support it in principle. (docs.ollama.com)\n\n\n\nSo the best solution is **not** “find some other random app first.” The best solution is:\n\n 1. verify `vulkaninfo`,\n 2. prove GPU offload in raw `llama.cpp`,\n 3. then fix Ollama’s service configuration. (github.com)\n\n\n\n## Final take\n\nFor **Asahi Linux on an M2 Studio** , the most compatible inference engine path today is:\n\n * **`llama.cpp` + Vulkan** as the main path\n * **Ollama + Vulkan** as a convenience layer after that\n * **not** the Metal/MLX-centered Apple-macOS stack for this OS (docs.ollama.com)\n\n\n\nThe cause of your CPU fallback is most likely **backend configuration or backend maturity** , not “Asahi Linux cannot use the GPU.” The cleanest fix is to **treat`llama.cpp` as the truth source**, get that working first, and only then bring Ollama back into the picture. (asahilinux.org)",
"title": "Asahi linux on an M2 Studio"
}