Working with LLMs on AMDGPUs

David Gasquez November 2, 2023
Source
This might only work for a few months (or even days), but after spending a few hours trying to get an open source LLMs to work on AMDGPUs inside Docker, I thought I'd share my findings. My GPU is an AMD 7900 XTX, and I was only able to make it work with the llama-cpp Python bindings. This should work for any ROCm supported AMDGPUs. The first thing is to build and setup our Docker image. This is what I ended up with: You might need to change gfx1100 to your GPU's family/target. Next, we need to build the image: Now we can run the image with this ~complex~ precise command: This will mount the current directory to /models inside the container and get you into a bash shell. Now is time to check if the Pytorch installation is working and able to detect the GPU. These commands should work: If everything is working, you should see something like this: Now, let's do some LLMing and put those graphical processing units to work with one of the latest models, Mistral! Download the model: And with that, we should be ready to run the model with llama-cpp-python: For me, it printed the following: ๐ŸŽ‰ ๐ŸŽ‰ ๐ŸŽ‰ If you, like me, are wondering if the GPU was actually being used, you can install nvtop and execute it. Finally, after a few hours and a bunch of tweaks, the GPU was using and Mistral 7B worked on my machine!

Discussion in the ATmosphere

Loading comments...