{
  "$type": "site.standard.document",
  "bskyPostRef": {
    "cid": "bafyreicsv2ovcklf5z3rolytbugcktxghhw6isb6do4jw6jtwojvhevbam",
    "uri": "at://did:plc:xrpvi727ccnv4bnwaedgs3gd/app.bsky.feed.post/3mfuq63lfmmf2"
  },
  "path": "/deprecate-confusing-apis-like-os-path-commonprefix?utm_campaign=rss",
  "publishedAt": "2026-02-27T00:00:00.000Z",
  "site": "https://sethmlarson.dev",
  "tags": [
    "`os.path.commonprefix()` function",
    "`os.path` module",
    "issue was opened",
    "explicit security warning",
    "deprecate the function",
    "Python Software Foundation",
    "Alpha-Omega",
    "CVE-2026-1703",
    "`is_within_directory()`",
    "added to pip in 2019",
    "40K uses of `os.path.commonprefix` on GitHub",
    "available on the internet",
    "Python Source Releases",
    "emailed the `python-dev` mailing list",
    "coverage.py project",
    "published a blog post in 2010",
    "patching a bug out of coverage.py",
    "SecureDrop",
    "vulnerable to path traversal",
    "during a security audit",
    "reported to bugs.python.org",
    "Donát Nagy",
    "Serhiy Storchaka",
    "just one of three total uses",
    "CVE-2007-4559",
    "launched a campaign",
    "this GitHub comment",
    "turns up 2.7K hits",
    "finally deprecate the function"
  ],
  "textContent": "The `os.path.commonprefix()` function has been an API in the Python standard library for at least 35 years (since February 1991) and in that time has been confusing users and creating security issues, even in programs explicitly trying to mitigate vulnerabilities. This was caused directly by the API's placement in the `os.path` module and further perpetuated by backwards compatibility.\n\nHere are my top-level takeaways from investigating this issue:\n\n  * Weigh surprise and potential for misuse higher than backwards compatibility. Deprecate, rename, and remove security-relevant functions that aren't designed to prevent accidental misuse.\n  * An API's “fitness for purpose” is implied to users through an API's “labeling” (such as: module, name, parameters). In the case of `commonprefix`, being included in `os.path` module implied to users that the function was meant to be used with paths.\n  * Documentation isn't enough. `commonprefix` surprising behavior was documented since 2002 and this wasn't enough to mitigate future insecure usage two decades later.\n  * Automatic static code analysis and linting tools are likely our best bets at cleaning up widespread footguns like this. An issue was opened with Ruff, a popular code formatter for Python.\n\n\n\nI've submitted pull requests that turn the documentation note into an explicit security warning and deprecate the function in Python 3.15. I am hoping that this case can be used as evidence in the future to more rapidly deprecate and replace functions that are confusing or easy to use insecurely on accident.\n\n> My work as the Security Developer-in-Residence at the Python Software Foundation is sponsored by Alpha-Omega. Thanks to Alpha-Omega for supporting security in the Python ecosystem.\n\n## Discovering the footgun\n\nEarlier this month I published the advisory for CVE-2026-1703, a vulnerability in pip that allows extremely limited path traversal when unpacking a wheel archive file. The root-cause for this vulnerability was a function `is_within_directory()` that checked whether a `target` path was within the extraction `directory`. Previously the function was implemented like so:\n\n\n    import os def is_within_directory(directory: str, target: str) -> bool: \"\"\"  Return true if the absolute path of target is within the directory  \"\"\" abs_directory = os.path.abspath(directory) abs_target = os.path.abspath(target) prefix = os.path.commonprefix([abs_directory, abs_target]) return prefix == abs_directory\n\nThis function was added to pip in 2019. The function intention is to check whether `directory` is the prefix of `target`, implying that `target` is within `directory`.\n\nHowever, this is not how `os.path.commonprefix()` works in practice. `commonprefix()` compares _character-by-character_ (`/`..`a`..`/`..`b`..), not using path segments (`/a`..`/b`..). This subtle difference means that the function is not safe to use on paths without extra mitigations, like so:\n\n\n    def is_within_directory(directory: str, target: str) -> bool: ... # By adding a \"/\" terminator, the character-by-character # algorithm is now safe to use for directories. abs_directory = os.path.abspath(directory) + \"/\" abs_target = os.path.abspath(target) prefix = os.path.commonprefix([abs_directory, abs_target]) return prefix == abs_directory\n\n## Investigating other usages\n\nSeeing this insecure usage in a critical library like pip signaled to me that this confusion was not likely to be isolated. There are almost 40K uses of `os.path.commonprefix` on GitHub alone. The git blame on `os.path.commonprefix` in CPython stalls out at the “initial commit” moving CPython to git, so we're going to have to start looking at source releases to see the full history. I also investigated the socializing around the API to document how the function has been misused and misunderstood over the years, and in doing so built a case to deprecate and remove the function.\n\n### Python 0.9.1 (1991)\n\nThe earliest version of Python source code available on the internet that I know of is for version 0.9.1. Wikipedia lists this version as being published in February 1991, so ~35 years ago this month. Looking in `lib/path.py` we see the earliest implementation of `commonprefix()` (with tabs instead of spaces):\n\n\n    # Return the longest prefix of all list elements. # def commonprefix(m): if not m: return '' prefix = m[0] for item in m: for i in range(len(prefix)): if prefix[:i+1] <> item[:i+1]: prefix = prefix[:i] if i = 0: return '' break return prefix\n\nThis implementation identical to the `commonprefix` function in current use, and still within a path-themed module. Note that `<>` is an alias for `!=`.\n\n### Python 2.0.1 (2000)\n\nThe earliest version of Python source code that's available on the contemporary “Python Source Releases” page is Python 2.0.1. In this source release in the `Lib/posixpath.py` we still see `commonprefix()` unchanged:\n\n\n    # Return the longest prefix of all list elements. def commonprefix(m): \"Given a list of pathnames, returns the longest common leading component\" if not m: return '' prefix = m[0] for item in m: for i in range(len(prefix)): if prefix[:i+1] <> item[:i+1]: prefix = prefix[:i] if i == 0: return '' break return prefix\n\n### First report of confusing behavior (2002)\n\nArmin Rigo emailed the `python-dev` mailing list in 2002 reporting their surprise at `commonprefix()` behavior, specifically noting the module:\n\n> “I recently discovered that `os.path.commonprefix(list-of-strings)` returns the longest substring that is an initial segment of all the given strings, and that this has nothing to do with the fact that the strings might be paths. I wonder why this has been put in `os.path` and not in the string module in the first place. This location misled me for a long time into thinking that commonprefix() would return the longest common *path*”\n\nMichael Hudson replied that they recalled a discussion which “decided that there is no use for such a thing, but that changing [the function] might break code for people who found a use”. Armin’s reply notes that the location of the function is the source of the confusion: “Can't we deprecate the thing and move it elsewhere?”. At this point it was decided to document the unexpected behavior with this warning:\n\n> “Note that this may return invalid paths because it works a character at a time.”\n\nThis thread also noted that the original intent for the function may have been to provide the behavior you get on a terminal after partially typing a path and then hitting `TAB` to auto-complete, but there wasn't a definitive answer as to why the function existed in `os.path`.\n\n### “What’s the point of os.path.commonprefix()?” (2010)\n\nNed Batchelder, maintainer of the popular coverage.py project, published a blog post in 2010 that noted that “the function is worse than useless, it’s misleading” and required patching a bug out of coverage.py as a result. Ned recommended that the documentation warning should explain that the function isn't meant to be used on paths and specifically that “the function is in the wrong place”.\n\n### SecureDrop path traversal vulnerability (2013)\n\nSecureDrop, a system deployed by many media organizations for securely accepting whistleblower submissions, was vulnerable to path traversal due to using `os.path.commonprefix()` API incorrectly. This issue was found during a security audit by Cure53. The confusing behavior and name mismatch wasn't reported upstream to the CPython project. As far as I know, this is the first known security issue resulting from `commonprefix()` unexpected behavior.\n\n### Python 3.5 (2017)\n\nIn 2017 Valentin Lorenz reported to bugs.python.org that `commonprefix` still doesn't actually process paths the way users expect, using characters instead of path segments. This report led to the addition of a new function, `os.path.commonpath()`, which found the common path segment prefix. The function `os.path.commonprefix()` was not deprecated.\n\n### HTTPPasswordMgr security issue (2020, 2022)\n\nDonát Nagy (2020) and later Serhiy Storchaka (2022) reported an issue in the `is_suburi()` method for the `HTTPPasswordMgr` class which used the `commonprefix()` function insecurely. Serhiy noted that at the time, this was just one of three total uses of the `os.path.commonprefix()` function within the standard library and that the use was insecure.\n\n### Trellix campaign to fix CVE-2007-4559 (2022)\n\nCVE-2007-4559 is a vulnerability in the Python tarfile module which allowed path traversal during extraction of a malicious tar archive. A security company Trellix launched a campaign to mitigate vulnerable code on GitHub that uses `TarFile.extractall()` by first checking all tar members for path traversal. Unfortunately, the filtering function uses `os.path.commonprefix()` insecurely, meaning the _filtering function itself_ is also vulnerable path traversal:\n\n\n    def is_within_directory(directory, target): abs_directory = os.path.abspath(directory) abs_target = os.path.abspath(target) prefix = os.path.commonprefix([abs_directory, abs_target]) return prefix == abs_directory\n\nRecognize this function? This function is almost identical to the one used in pip. As far as I can tell, the implementation was copied from pip which was merged in 2019, but not yet discovered to be vulnerable.\n\nAccording to this GitHub comment, over 61,000 pull requests were submited with this insecure mitigation for CVE-2007-4559. Searching for the name `is_within_directory()` in Python files turns up 2.7K hits. From my own testing none of the `os.path.commonprefix()` use within top projects on the Python Package Index (PyPI) are vulnerable, but projects that use the `is_within_directory()` function provided by Trellix on GitHub should switch to using `os.path.commonpath()`.\n\nThis long history of insecure usage was enough to finally deprecate the function in favor of `os.path.commonpath()`. I hope this story can be evidence that when users report accidentally using functions insecurely that it's a signal to fix the confusing labeling by renaming or removing the function in favor of an API designed and labeled appropriately.\n\n\n\n\n* * *\n\nThanks for keeping RSS alive! ♥",
  "title": "Deprecate confusing APIs like “os.path.commonprefix()”",
  "updatedAt": "2026-02-27T00:00:00.000Z"
}