Raw Record Source

{
  "path": "/posts/2024/sandboxed-python-env",
  "site": "at://did:plc:mracrip6qu3vw46nbewg44sm/site.standard.publication/self",
  "tags": [
    "language_models",
    "python",
    "security",
    "nix",
    "docker"
  ],
  "$type": "site.standard.document",
  "title": "Sandboxed Python Environment",
  "updatedAt": "2024-01-21T03:53:12.000Z",
  "publishedAt": "2024-01-21T03:53:12.000Z",
  "textContent": "Disclaimer: I am not a security expert or a security professional.\n\nI've tried out many new AI/LLM libraries in the past year.\nMany of these are written in Python.\nWhile trying out new and exciting software is a lot of fun, it's also important to be mindful about what code you allow to run on your system.\nEven if code is open source, it's still _possible_ that the cool open source library you installed includes code like\n\nI strongly recommend vetting any libraries you use, using separate API keys per app and setting a spend cap on OpenAI in case your key is compromised.\nHowever, your OPENAI_API_KEY isn't all you need to worry about.\nPython code (including dependencies) has access to your entire os.environ.\nIt's somewhat common to set environment variables for the shell to be available system wide.\nZsh has a dedicated file that gets sourced when the shell starts up (.zshenv).\nSo if you hypothetically had GITHUB_API_TOKEN set in your environment, some open source library could send that secret to its own server and gain access to your stuff.\n\nThankfully, over time, open source libraries _usually_ are scrutinized to the degree that this type of credential stealing becomes more difficult to execute or scale.\nHowever, with the advent of agent-like, language-model-based systems, certain libraries are now asking us to allow them to execute system commands on our behalf.\nWhile many of these require user approval, some have automatic approval capabilities, allowing a language model to roam freely among your system.\nIf you're reading this article, you probably already know this isn't an awesome idea.\n\nSome trial and error\n\nI tried several approaches to solve this problem before I identified one that seemed to address most of my concerns.\nMy goal was to find a safe setup that expected code to use a language model to execute code, peek at my environment or poke around the file system.\n\nTrying out env\n\nMy initial inclination was to try and clear out my environment variables to protect against a program trying to steal my secrets.\nThe env -i command can execute a shell command with an empty environment.\nUnfortunately, this approach removes too much of what is needed to run Python, so it wasn't viable.\n\nTrying out nix\n\nNix seemed like another possible candidate that could manage an independent version of Python and my dependencies\nAfter a bit of searching, I found a way to create a shell with a nix-specified environment using nix-shell.\nLoosely following instructions from this article, I created a shell.nix file with the following contents\n\nI ran nix-shell from the same directory, which put me in a shell (within my shell) with a specific version of Python and my specified dependencies installed.\n\nNix worked as advertised, but I realized this approach didn't provide isolation from my environment variables or my system.\nAny Python code I ran from the within the nix shell could still read my environment variables or mess with my host file system if it was malicious.\n\nNix also has the ability to run a pure shell, which per the docs, will clear most of the environment variables.\nI tried this out but it quickly became apparent it was too stripped down for what I was looking for.\nIt also still had host file system access.\n\nUsing Docker\n\nGiven the two main constraints\n\n- host environment variable protection\n- host file system protection\n\nI moved on to try and find an approach using Docker, which I knew to provide better file system isolation and an independent set of environment variables.\nThis approach has become the one I use when I want to try out a new library to get a sense of its capabilities while being mindful of my system's privacy.\n\nHere is how it works.\nFirst, I created a new project with the following files and contents\n\n.env\n\nDockerfile\n\nrun.py\n\nrequirements.txt\n\nMakefile\n\nWith this setup, I added my environment variables to .env, my dependencies to requirements.txt, my code to run.py and to run it all, make run, which builds and runs the container defined in Dockerfile.\n\nIt's not the easiest or the cleanest approach for ongoing development of a project, but it provides a reasonable way to sandbox and isolate new code you want to try out that your don't necessarily trust.\nI did several hours of research to find an approach I was satisfied with but I suspect there are other good options out there.\nI would love to hear from you if you have an approach you like.\n\nYou can find the code from this post here.",
  "canonicalUrl": "https://www.danielcorin.com/posts/2024/sandboxed-python-env"
}