{
"$type": "site.standard.document",
"canonicalUrl": "https://rednafi.com/python/module-getattr/",
"description": "Speed up Python module imports with __getattr__ from PEP 562 for lazy loading, deprecation warnings, and dynamic attribute access.",
"path": "/python/module-getattr/",
"publishedAt": "2024-11-03T00:00:00.000Z",
"site": "at://did:plc:fgtm2c26vfcj74rfmeggbyqj/site.standard.publication/3mnl6f7ob462z",
"tags": [
"Python",
"TIL",
"Performance"
],
"textContent": "This morning, someone on Twitter pointed me to [PEP 562], which introduces __getattr__ and\n__dir__ at the module level. While __dir__ helps control which attributes are printed\nwhen calling dir(module), __getattr__ is the more interesting addition.\n\nThe __getattr__ method in a module works similarly to how it does in a Python class. For\nexample:\n\nIn this class, __getattr__ defines what happens when specific attributes are accessed,\nallowing you to manage how missing attributes behave. Since Python 3.7, you can also define\n__getattr__ at the module level to handle attribute access on the module itself.\n\nFor instance, if you have a module my_module.py:\n\nUsing this module:\n\nIf an attribute isn't found through the regular lookup (using object.__getattribute__),\nPython will look for __getattr__ in the module's __dict__. If found, it calls\n__getattr__ with the attribute name and returns the result. But if you're looking up a\nname directly as a module global, it bypasses __getattr__. This prevents performance\nissues that would arise from repeatedly invoking __getattr__ for built-in or common\nattributes.\n\nOne practical use for module-level __getattr__ is lazy-loading heavy dependencies to\nimprove startup performance. Imagine you have a module that relies on a large library but\ndon't need it immediately at import.\n\nWith this setup, importing heavy_module doesn't immediately import NumPy. Only when you\naccess heavy_module.np does it trigger the import:\n\nThe first access to heavy_module.np imports NumPy (adding ~150ns), but since we cache np\nwith globals()['np'] = np, subsequent accesses are fast, as the module now holds the\nreference to NumPy.\n\nThis approach is handy in scenarios like CLIs where you want to keep startup quick. For\nexample, if you need to initialize a database connection but only for specific commands, you\ncan defer the setup until needed.\n\nHere's an example with SQLite (though SQLite connections are quick, imagine a slower\nconnection here):\n\nIn this setup, nothing is instantiated when you import db_module. The connection is only\ninitialized on the first access of db_module.connection. Later calls use the cached\n_connection, making subsequent access fast.\n\nHere's how you might use it in a CLI:\n\nWhen you run python cli.py greet, the CLI starts quickly since it doesn't initialize the\ndatabase connection. But running python cli.py show_data accesses db_module.connection,\nwhich triggers the connection setup.\n\nThis could also be achieved by defining a function that initializes the database connection\nand caches it for subsequent calls. However, using module-level __getattr__ can be more\nconvenient if you have multiple global variables that require expensive calculations or\ninitializations. Instead of writing separate functions for each variable, you can handle\nthem all within the __getattr__ method.\n\nHere's one example of [using it for a non-trivial case in the wild].\n\n\n\n\n[pep 562]:\n https://peps.python.org/pep-0562/\n\n[using it for a non-trivial case in the wild]:\n https://github.com/PrefectHQ/prefect/blob/f196fb3da6ae747f7362be2f21e85b01f32e539c/src/prefect/__init__.py#L102",
"title": "Quicker startup with module-level __getattr__"
}