{
"$type": "site.standard.document",
"canonicalUrl": "https://rednafi.com/python/pathlib/",
"description": "Replace os.path with Python's pathlib for elegant, object-oriented path manipulation with intuitive operators and unified file operations.",
"path": "/python/pathlib/",
"publishedAt": "2020-04-13T00:00:00.000Z",
"site": "at://did:plc:fgtm2c26vfcj74rfmeggbyqj/site.standard.publication/3mnl6f7ob462z",
"tags": [
"Python"
],
"textContent": "When I first encountered Python's pathlib module for path manipulation, I brushed it aside\nassuming it to be just an OOP way of doing what os.path already does quite well. The\nofficial doc also dubs it as the Object-oriented filesystem paths. However, back in 2019\nwhen [an issue] confirmed that Django was replacing os.path with pathlib, I got curious.\n\nThe os.path module has always been the de facto standard for working with paths in Python.\nBut the API can feel massive as it performs a plethora of other loosely coupled system\nrelated jobs. I've to look things up constantly even to perform some of the most basic tasks\nlike joining multiple paths, listing all the files in a folder having a particular\nextension, opening multiple files in a directory etc. The pathlib module can do nearly\neverything that os.path offers and comes with some additional cherries on top.\n\nProblem with Python's path handling\n\nTraditionally, Python has represented file paths as regular text strings. So far, using\npaths as strings with os.path module has been adequate although a bit cumbersome. However,\npaths aren't actually strings and this has necessitated the usage of multiple modules to\nprovide disparate functionalities that are scattered all around the standard library,\nincluding libraries like os, glob, and shutil. The following code uses three modules\njust to copy multiple python files from current directory to another directory called src:\n\nThe above pattern can get complicated fairly quickly and you have to know or look for\nspecific modules and methods in a large search space to perform your path manipulations.\nLet's have a look at a few more examples of performing the same tasks using os.path and\npathlib modules.\n\nJoining & creating new paths\n\nSay you want to achieve the following goals:\n\n- There is a file named file.txt in your current directory and you want to create the path\n for another file named file_another.txt in the same directory.\n- Then you want to save the absolute path of file_another.txt in a new variable.\n\nLet's see how you'd usually do this via the os module.\n\nThe variables file_path, base_dir, file_another_path look like this on my machine:\n\nYou can use the usual string methods to transform the paths but generally, that's not a good\nidea. So, instead of joining two paths with + like regular strings, you should use\nos.path.join() to join the components of a path. This is because different operating\nsystems don't define paths in the same way. Windows uses \"\\\" while Mac and \\nix based\nOSes use \"/\" as a separator. Joining with os.path.join() ensures correct path separator\non the corresponding operating system. Pathlib module uses \"/\" operator overloading and\nmake this a little less painful.\n\nThe resolve method finds out the absolute path of the file. From there you can use the\nparent method to find out the base directory and add the another_file.txt accordingly.\n\nMaking directories & renaming files\n\nHere's a piece of code that:\n\n- Tries to make a src/stuff/ directory when it already exists.\n- Renames a file in the src directory called .config to .stuffconfig.\n\nHere is the same thing done using the pathlib module:\n\nNotice the output where the renamed file path is printed. It's not a simple string, rather a\nPosixPath object that indicates the type of host system (Linux in this case). You can\nalmost always use stringified path values and the Path objects interchangeably.\n\nListing specific types of files in a directory\n\nLet's say you want to recursively visit nested directories and list .py files in a\ndirectory called source. The directory looks like this:\n\nUsually, glob module is used to resolve this kind of situation:\n\nThe above approach works perfectly. However, if you don't want to use another module just\nfor a single job, pathlib has embedded glob and rglob methods. You can entirely ignore\nglob and achieve the same result in the following way:\n\nThis will also print the same as before:\n\nBy default, both Path.glob and Path.rglob returns a generator object. Calling list on\nthem gives you the desired result. Notice how rglob method can discover the desired files\nwithout you having to mention the directory structure with wildcards explicitly. Pretty\nneat, huh?\n\nOpening multiple files & reading their contents\n\nNow let's open the .py files and read their contents that you recursively discovered in\nthe previous example:\n\nThe pathlib implementation is almost identical as above:\n\nYou can also cook up a more robust implementation with generator comprehension and context\nmanager:\n\nAnatomy of the pathlib module\n\nPrimarily, pathlib has two high-level components, pure path and concrete path. Pure\npaths are absolute Path objects that can be instantiated regardless of the host operating\nsystem. On the other hand, to instantiate a concrete path, you need to be on the specific\ntype of host expected by the class. These two high level components are made out of six\nindividual classes internally coupled by inheritance. They are:\n\n1. PurePath (Useful when you want to work with windows path on a Linux machine)\n2. PurePosixPath (Subclass of PurePath)\n3. PureWindowsPath (Subclass of PurePath)\n4. Path (Concrete path object, most of the time, you'll be dealing with this one)\n5. PosixPath (Concrete posix path, subclass of Path)\n6. WindowsPath (Concrete windows path, subclass of Path)\n\nThis UML diagram from the official docs does a better job at explaining the internal\nrelationships between the component classes.\n\n![UML diagram showing Python pathlib class hierarchy with PurePath and Path][image_1]\n\nUnless you are doing cross platform path manipulation, most of the time you'll be working\nwith the concrete Path object. So I'll focus on the methods and properties of Path class\nonly.\n\nOperators\n\nInstead of using os.path.join you can use / operator to create child paths.\n\nAttributes & methods\n\nThe following tree shows an inexhaustive list of attributes and methods that are associated\nwith Path object. I have cherry picked some of the attributes and methods that I use most\nof the time while doing path manipulation. Head over to the official docs for a more\ndetailed list. We'll linearly traverse through the tree and provide necessary examples to\ngrasp their usage.\n\nLet's dive into their usage one by one. For all the examples, We'll be using the previously\nseen directory structure.\n\nPath.parts\n\nReturns a tuple containing individual components of a path.\n\nPath.parents & Path.parent\n\nPath.parents returns an immutable sequence containing the all logical ancestors of the\npath. While Path.parent returns the immediate predecessor of the path.\n\nPath.name\n\nReturns the last component of a path as string. Usually used to extract file name from a\npath.\n\nPath.suffixes & Path.suffix\n\nPath.suffixes returns a list of extensions of the final component. Path.suffix only\nreturns the last extension.\n\nPath.stem\n\nReturns the final path component without the suffix.\n\nPath.is_absolute\n\nChecks if a path is absolute or not. Return boolean value.\n\nPath.joinpath(\\other)\n\nThis method is used to combine multiple components into a complete path. This can be used as\nan alternative to \"/\" operator for joining path components.\n\nPath.cwd()\n\nReturns the current working directory.\n\nPath.home()\n\nReturns home directory.\n\nPath.exists()\n\nChecks if a path exists or not. Returns boolean value.\n\nPath.expanduser()\n\nReturns a new path with expanded ~ symbol.\n\nPath.glob()\n\nGlobs and yields all file paths matching a specific pattern. Let's discover all the files in\nsrc/stuff/ directory that have .py extension.\n\nPath.rglob(pattern)\n\nThis is like Path.glob method but matches the file pattern recursively.\n\nPath.is_dir()\n\nChecks if a path points to a directory or not. Returns boolean value.\n\nPath.is_file()\n\nChecks if a path points to a file. Returns boolean value.\n\nPath.is_absolute()\n\nChecks if a path is absolute or relative. Returns boolean value.\n\nPath.iterdir()\n\nWhen the path points to a directory, this yields the content path objects.\n\nPath.mkdir(mode=0o777, parents=False, exist_ok=False)\n\nCreates a new directory at this given path.\n\nParameters:\n\n- mode:(_str_) Posix permissions (mimicking the POSIX mkdir -p command)\n\n- parents:(_boolean_) If parents is True, any missing parents of this path are created\n as needed. Otherwise, if the parent is absent, FileNotFoundError is raised.\n\n- exist_ok: (_boolean_) If False, FileExistsError is raised if the target directory\n already exists. If True, FileExistsError is ignored.\n\nPath.open(mode='r', buffering=-1, encoding=None, errors=None, newline=None)\n\nThis is same as the built in open function.\n\nPath.rename(target)\n\nRenames this file or directory to the given target and returns a new Path instance pointing\nto target. This will raise FileNotFoundError if the file is not found.\n\nPath.replace(target)\n\nReplaces a file or directory to the given target. Returns the new path instance.\n\nPath.resolve(strict=False)\n\nMake the path absolute, resolving any symlinks. A new path object is returned. If strict is\nTrue and the path doesn't exist, FileNotFoundError will be raised.\n\nPath.rmdir()\n\nRemoves a path pointing to a file or directory. The directory must be empty, otherwise,\nOSError is raised.\n\nSo, should you use it?\n\nPathlib was introduced in python 3.4. However, if you are working with python 3.5 or\nearlier, in some special cases, you might have to convert pathlib.Path objects to regular\nstrings. But since python 3.6, Path objects work almost everywhere you are using\nstringified paths. Also, the Path object nicely abstracts away the complexity that arises\nwhile working with paths in different operating systems.\n\nThe ability to manipulate paths in an OO way and not having to rummage through the massive\nos or shutil module can make path manipulation a lot less painful.\n\nFurther reading\n\n- [pathlib - Object-oriented filesystem paths]\n- [Python 3's pathlib Module: Taming the File System]\n- [Why you should be using pathlib]\n\n\n\n\n[an issue]:\n https://code.djangoproject.com/ticket/29983\n\n[pathlib - Object-oriented filesystem paths]:\n https://docs.python.org/3/library/pathlib.html\n\n[python 3's pathlib module: taming the file system]:\n https://realpython.com/python-pathlib/\n\n[why you should be using pathlib]:\n https://treyhunner.com/2018/12/why-you-should-be-using-pathlib/#The_os_module_is_crowded\n\n[image_1]:\n https://blob.rednafi.com/static/images/pathlib/img_1.png",
"title": "No really, Python's pathlib is great"
}