Prefer urlsplit over urlparse to destructure URLs

Redowan Delowar September 10, 2022
Source
TIL from this [video by Anthony Sottile] that Python's [urlparse] is quite slow at parsing URLs. I've always used urlparse to destructure URLs and didn't know that there's a faster alternative to this in the standard library. The official documentation also recommends the alternative function. The urlparse function splits a supplied URL into multiple seperate components and returns a ParseResult object. Consider this example: You can see how the function disassembles the URL and builds a ParseResult object with the URL components. Along with this, the urlparse function can also parse an obscure type of URL that you'll most likely never need. If you notice closely in the previous example, you'll see that there's a params argument in the ParseResult object. This params argument gets parsed whether you need it or not and that adds some overhead. The params field will be populated if you have a URL like this: Notice the parts in the URL that appears after https://httpbin.org/get. There's a semicolon and a few more parameters succeeding that - ;a=mars&b=42. The resulting ParseResult now has the params field populated with the parsed param value a=mars&b=42. Unless you need this param support, there's a better and faster alternative to this in the standard library. The [urlsplit] function does the same thing as urlparse minus the param parsing and is twice as fast. Here's how you'd use urlsplit: The urlsplit function returns a SplitResult object similar to the ParseResult object you've seen before. Notice there's no param argument in the output here. I measured the speed difference like this: Wow, that's almost 2x speed improvement. Although this shouldn't be much of an issue in a real codebase but it can matter if you are parsing URLs in a critical hot path. [video by Anthony Sottile]: https://www.youtube.com/watch?v=ABJvdsIANds [urlparse]: https://docs.python.org/3/library/urllib.parse.html#urllib.parse.urlparse [urlsplit]: https://docs.python.org/3/library/urllib.parse.html#urllib.parse.urlsplit

Discussion in the ATmosphere

Loading comments...