Raw Record Source

{
  "$type": "site.standard.document",
  "coverImage": {
    "$type": "blob",
    "ref": {
      "$link": "bafkreibmuqwvrl42a7wf67ixbetqwl2o5lc6qurqhih2jjcxoknr3p4h4u"
    },
    "mimeType": "image/webp",
    "size": 57304
  },
  "description": "A new GitHub actions to cache your Docker volumes for faster and cheaper builds.",
  "path": "/posts/docker-volume-caching-gha/",
  "publishedAt": "2025-01-14T00:00:00.000Z",
  "site": "at://did:plc:kl3s4yablm3fgnxfkn47uy5r/site.standard.publication/self",
  "tags": [
    "ci"
  ],
  "textContent": "I joined Sentry to exclusively work on their self-hosted product in 2019. Back then, Sentry was just using a few services:\nPostgres, Memcached, Redis, and Sentry itself. But it was on the cusp of becoming a multi-service application with the introduction of Snuba\nand along with that Kafka, Relay , Symbolicator and others. Because it was supposed to be simple,\nself-hosted (or onpremise as it was called back then) did not have any tests or even any automation: just a bunch of instructions and\ncommands to run in the README. With the rapid increase in the number of engineers working on Sentry and the changes being made, it was clear that we needed to automate the testing\nand setup of the self-hosted repository.\n\nTo summarize about a year’s worth of work: we created an install script based in bash (as that was the most common denominator across all platforms), and a very cursory test suite\nwhich ran the install script, tried to ingest an event, and read it back. The entire test suite took about 5-6 minutes to run and about half of that time was spent on running\nDjango migrations, from scratch, on a fresh database, over, and over, and over. The thing is we didn’t even add migrations frequently but we still had to run them all to get the\nservice up and running.\n\nThe solution was obviously caching but caching Docker volumes was not really a thing that seemed feasible back then. Remember, this is 2019-2020, GitHub Actions was still in its infancy.\nI was also barely getting comfortable with all that Bash and Docker stuff. Then I got distracted by other things, changed jobs, and eventually came back to Sentry to see that this was\nstill a problem. So I decided to tackle it head-on. I was going to cache the hell out of those Docker volumes for our databases.\nWe already had actions/cache now so how hard could it be? Famous last words.\n\nI have spent about 2 weeks to completely figure this out. About 50% of this was my ignorance about basic Linux tools such as tar , file/directory permissions, and Docker’s\nway of storing volumes. About 30% was me not trying things locally properly and just pushing to CI and waiting for the results. The remaining 20% was the actual hard parts to figure\nout, mostly thanks to StackOverflow (yeah, still not on that “ChatGPT for everything” bandwagon 1 ). I’ll summarize some of the findings here so you don’t\nhave to go through the same pain as I did:\n\nDocker volumes are stored under /var/lib/docker/volumes (by default, and please don’t change it)\n\nYou cannot stat a directory or anything under it if you don’t have x permission on the directory itself (╯°□°)╯︵ ┻━┻\n\ntar does preserve permissions and ownership by default but only if you are running it as root (or with sudo ) (╯°□°)╯︵ ┻━┻ x 2\n\ntar preserves ownership information as names and not as IDs so if your Docker container uses a user id like 1000 , GLHF 2 (╯°□°)╯︵ ┻━┻ x 3\n\nLinux (Unix?) fs permissions are not just rwx but there’s also an s you can set on executables to allow them to set ownership of other things 3 ＼（〇_ｏ）／\n\nNot only GitHub Actions doesn’t run tar with sudo , and not only it refuses to do this, it also doesn’t allow you to run tar with --same-owner or --numeric-owner (╯°□°)╯︵ ┻━┻ x 4\n\nBonus: there are these awesome tools called getfacl and setfacl that lets you backup and restore ACLs BUT NOT OWNERSHIP INFORMATION (╯°□°)╯︵ ┻━┻ x 5\n\nBonus 2: mv would happily overwrite your target without even mentioning, especially if you use sudo .\n\nSo, with all this information, what is needed to cache Docker volumes on GitHub Actions and restore them properly? Let’s see:\n\nSet +x permission on /var/lib/docker\n\nSet +rx permission on /var/lib/docker/volumes\n\nSet u+s permission on tar\n\nUse tar --numeric-owner to create the archive — oh wait, you can’t because actions/cache doesn’t let you (╯°□°)╯︵ ┻━┻ (╯°□°)╯︵ ┻━┻ (╯°□°)╯︵ ┻━┻ (╯°□°)╯︵ ┻━┻\n\nSide quest: Hacking tar on GitHub Actions\n\nOnce I realized that I had to change the options passed to tar , I very reluctantly decided to “wrap” the actual tar executable:\n\nsudo cp /usr/bin/tar /usr/bin/tar.orig\nsudo echo 'exec tar.orig --numeric-owner -p --same-owner \"$@\"' > /usr/bin/tar\n\nOh, but wait, you cannot sudo redirect output to a file as sudo just runs the command and redirection is done by the shell which you are not running as root. Let’s try that again:\n\nsudo cp /usr/bin/tar /usr/bin/tar.orig\necho 'exec /usr/bin/tar.orig --numeric-owner -p --same-owner \"$@\"' | sudo tee /usr/bin/tar > /dev/null\n\nOnce I added this monstrosity, my GitHub Actions runs… started to hang indefinitely. Can you see the issue? ಠಿ_ಠ\nWell, I couldn’t. I spent about 2 hours trying to figure out why this was happening. I suspected exec might be the culprit and when I removed it, the runs at least started crashing with an error: cannot fork . What?\nWell, see I was doing this both in my restore and save actions. So, when the restore action ran, it wrapped/replaced tar but then did not restore the original back. After some time, save action ran trying to\ndo the same. Now remember our “Bonus 2” learning from above: when save also backed up tar (which was actually my wrapper script) to /usr/bin/tar.orig , mv didn’t even flinch when tar.orig already existed. Now\nI had 2 copies of my wrapper script where the second one just exec ed itself. Nice fork bomb there, me 4 .\n\nOnce the fork bomb was defused, I was able to run actions/cache and viola! My volumes were cached and restored properly. Space time is saved Marty!\n\nFinal boss\n\nAfter all this, I was still not very happy as it made all action/cache calls in my workflow doubled, and with the same hack repeated in both parts. So I decided to create a GitHub Action that would contain the chaos, the\nmadness, the fork bomb minefield, and all the other ugliness. Both from my sight and others’. Please enjoy BYK/docker-volume-cache-action and cache responsibly.\n\nFootnotes\n\nThat said all images for this article was generated by DeepAI Image Generator ↩\n\nLooking at you confluentinc/cp-kafka ↩\n\nYes, yes, there are even more. Can you believe it? I couldn’t either. But I digress. ↩\n\nMe when I realized this: mother forking shirt balls! ↩",
  "title": "Docker Volume Caching on GitHub Actions"
}