{
"$type": "site.standard.document",
"canonicalUrl": "https://rednafi.com/python/use-urlsplit-over-urlparse/",
"description": "Use Python's urlsplit instead of urlparse for faster URL parsing by skipping the rarely-needed params component in URL decomposition.",
"path": "/python/use-urlsplit-over-urlparse/",
"publishedAt": "2022-09-10T00:00:00.000Z",
"site": "at://did:plc:fgtm2c26vfcj74rfmeggbyqj/site.standard.publication/3mnl6f7ob462z",
"tags": [
"Python",
"Performance"
],
"textContent": "TIL from this [video by Anthony Sottile] that Python's [urlparse] is quite slow at parsing\nURLs. I've always used urlparse to destructure URLs and didn't know that there's a faster\nalternative to this in the standard library. The official documentation also recommends the\nalternative function.\n\nThe urlparse function splits a supplied URL into multiple seperate components and returns\na ParseResult object. Consider this example:\n\nYou can see how the function disassembles the URL and builds a ParseResult object with the\nURL components. Along with this, the urlparse function can also parse an obscure type of\nURL that you'll most likely never need. If you notice closely in the previous example,\nyou'll see that there's a params argument in the ParseResult object. This params\nargument gets parsed whether you need it or not and that adds some overhead. The params\nfield will be populated if you have a URL like this:\n\nNotice the parts in the URL that appears after https://httpbin.org/get. There's a\nsemicolon and a few more parameters succeeding that - ;a=mars&b=42. The resulting\nParseResult now has the params field populated with the parsed param value\na=mars&b=42. Unless you need this param support, there's a better and faster alternative\nto this in the standard library. The [urlsplit] function does the same thing as urlparse\nminus the param parsing and is twice as fast. Here's how you'd use urlsplit:\n\nThe urlsplit function returns a SplitResult object similar to the ParseResult object\nyou've seen before. Notice there's no param argument in the output here. I measured the\nspeed difference like this:\n\nWow, that's almost 2x speed improvement. Although this shouldn't be much of an issue in a\nreal codebase but it can matter if you are parsing URLs in a critical hot path.\n\n\n\n\n[video by Anthony Sottile]:\n https://www.youtube.com/watch?v=ABJvdsIANds\n\n[urlparse]:\n https://docs.python.org/3/library/urllib.parse.html#urllib.parse.urlparse\n\n[urlsplit]:\n https://docs.python.org/3/library/urllib.parse.html#urllib.parse.urlsplit",
"title": "Prefer urlsplit over urlparse to destructure URLs"
}