Python dependency management redux

Redowan Delowar June 27, 2023
Source
One major drawback of Python's huge ecosystem is the significant variances in workflows among people trying to accomplish different things. This holds true for dependency management as well. Depending on what you're doing with Python - whether it's building reusable libraries, writing web apps, or diving into data science and machine learning - your workflow can look completely different from someone else's. That being said, my usual approach to any development process is to pick a method and give it a shot to see if it works for my specific needs. Once a process works, I usually automate it and rarely revisit it unless something breaks. Also, I actively try to abstain from picking up tools that haven't stood the test of time. If the workflow laid out here doesn't work for you and something else does, that's fantastic! I just wanted to document a more modern approach to the dependency management workflow that has reliably worked for me over the years. Plus, I don't want to be the person who still uses [distutils] in their package management workflow and gets reprehended by pip for doing so. Defining the scope Since the dependency management story in Python is a huge mess for whatever reason, to avoid getting yelled at by the most diligent gatekeepers of the internet, I'd like to clarify the scope of this piece. I mainly write web applications in Python and dabble in data science and machine learning every now and then. So yeah, I'm well aware of how great [conda] is when you need to deal with libraries with C dependencies. However, that's not typically my day-to-day focus. Here, I'll primarily delve into how I manage dependencies when developing large-scale web apps and reusable libraries. In applications, I manage my dependencies with [pip] and [pip-tools], and for libraries, my preferred build backend is [hatch]. [PEP-621] attempts to standardize the process of storing project metadata in a pyproject.toml file, and I absolutely love the fact that now, I'll mostly be able to define all my configurations and dependencies in a single file. This made me want to rethink how I wanted to manage the dependencies without sailing against the current recommended standard while also not getting swallowed into the vortex of conflicting opinions in this space. In applications Whether I'm working on a large Django monolith or exposing a microservice via FastAPI or Flask, while packaging an application, I want to be able to: - Store all project metadata, linter configs, and top-level dependencies in a pyproject.toml file following the [PEP-621] conventions. - Separate the top-level application and development dependencies. - Generate requirements.txt and requirements-dev.txt files from the requirements specified in the TOML file, where the top-level and their transient dependencies will be pinned to specific versions. - Use vanilla pip to build the application hermetically from the locked dependencies specified in the requirements.txt files. The goal is to simply be able to run the following command to install all the pinned dependencies in a reproducible manner: pip-tools allows me to do exactly that. Suppose, you have an app where you're defining the top-level dependencies in a canonical pyproject.toml file like this: Here, following PEP-621 conventions, we've specified the app and dev dependencies in the project.dependencies and project.optional-dependencies.dev sections respectively. Now in a virtual environment, install pip-tools and run the following commands: Running the commands will create two lock files requirements.txt and requirements-dev.txt where all the pinned top-level and transient dependencies will be listed out. The contents of the requirements.txt file looks like this (truncated): Similarly, the content of requirements-dev.txt file goes as follows (truncated): Once the lock files are generated, you're free to build the application in however way you see fit and the build process doesn't even need to be aware of the existence of pip-tools. In the simplest case, you can just run pip install to build the application. Check out this [fastapi-nano example] that uses the workflow explained in this section. In libraries While packaging libraries, I pretty much want the same things mentioned in the application section. However, the story of dependency management in reusable libraries is a bit more hairy. Currently, there's no standard around a lock file and I'm not aware of a way to build artifacts from a plain requirements.txt file. For this purpose, my preferred build backend is [hatch]. Mostly because it follows the latest standards formalized by the associated PEPs. From the FAQ section of the hatch docs: > Q: What is the risk of lock-in? > > A: Not much! Other than the plugin system, everything uses Python's established standards > by default. Project metadata is based entirely on [PEP-621]/[PEP-631], the build system is > compatible with [PEP-517]/[PEP-660], versioning uses the scheme specified by [PEP-440], > dependencies are defined with [PEP-508] strings, and environments use virtualenv. However, it doesn't support lock files yet: > The only caveat is that currently there is no support for re-creating an environment given > a set of dependencies in a reproducible manner. Although a standard lock file format may > be far off since [PEP-665] was rejected, resolving capabilities are coming to pip. When > that is stabilized, Hatch will add locking functionality and dedicated documentation for > managing applications. In my experience, I haven't faced many issues regarding the lack of support for lock files while building reusable libraries. Your mileage may vary. Now let's say we're trying to package up a CLI that has the following source structure: The content of cli.py looks like this: The corresponding pyproject.toml file looks as follows: Now install hatch in your virtualenv and run the following command to create the build artifacts: This will create the build artifacts in the src directory: You can now install the local wheel file to test the build: Once you've installed the CLI locally, you can test it by running foo-cli from your console: This returns: You can also build and install the CLI with: Hatch also provides a hatch publish command to upload the package to PyPI. For a complete reference, check out how I shipped [rubric] following this workflow. Further reading - [Using pyproject.toml in your Django project - Peter Baumgartner] - [TIL: pip-tools Supports pyproject.toml - Hynek Schlawack] [distutils]: https://docs.python.org/3.11/distutils/ [conda]: https://docs.conda.io/en/latest/ [pip]: https://pip.pypa.io/en/stable/ [pip-tools]: https://pip-tools.readthedocs.io/en/latest/ [hatch]: https://hatch.pypa.io/latest/ [pep-621]: https://peps.python.org/pep-0621/ [fastapi-nano example]: https://github.com/rednafi/fastapi-nano/blob/master/pyproject.toml [pep-631]: https://peps.python.org/pep-0631/ [pep-517]: https://peps.python.org/pep-0517/ [pep-660]: https://peps.python.org/pep-0660/ [pep-440]: https://peps.python.org/pep-0440/ [pep-508]: https://peps.python.org/pep-0508/ [pep-665]: https://peps.python.org/pep-0665/ [rubric]: https://github.com/rednafi/rubric/blob/main/pyproject.toml [using pyproject.toml in your django project - peter baumgartner]: https://lincolnloop.com/insights/using-pyprojecttoml-in-your-django-project/ [til: pip-tools supports pyproject.toml - hynek schlawack]: https://hynek.me/til/pip-tools-and-pyproject-toml/

Discussion in the ATmosphere

Loading comments...