Working with LLMs on AMDGPUs
David Gasquez
November 2, 2023
This might only work for a few months (or even days), but after spending a few hours trying to get an open source LLMs to work on AMDGPUs inside Docker, I thought I'd share my findings. My GPU is an AMD 7900 XTX, and I was only able to make it work with the llama-cpp Python bindings. This should work for any ROCm supported AMDGPUs.
The first thing is to build and setup our Docker image. This is what I ended up with:
You might need to change gfx1100 to your GPU's family/target.
Next, we need to build the image:
Now we can run the image with this ~complex~ precise command:
This will mount the current directory to /models inside the container and get you into a bash shell. Now is time to check if the Pytorch installation is working and able to detect the GPU. These commands should work:
If everything is working, you should see something like this:
Now, let's do some LLMing and put those graphical processing units to work with one of the latest models, Mistral!
Download the model:
And with that, we should be ready to run the model with llama-cpp-python:
For me, it printed the following:
๐ ๐ ๐
If you, like me, are wondering if the GPU was actually being used, you can install nvtop and execute it.
Finally, after a few hours and a bunch of tweaks, the GPU was using and Mistral 7B worked on my machine!
Discussion in the ATmosphere