{
"$type": "site.standard.document",
"canonicalUrl": "https://rednafi.com/python/memory-leakage-in-descriptors/",
"description": "Prevent memory leaks in Python descriptors by using weakref to avoid hard references that prevent garbage collection of validated objects.",
"path": "/python/memory-leakage-in-descriptors/",
"publishedAt": "2023-07-16T00:00:00.000Z",
"site": "at://did:plc:fgtm2c26vfcj74rfmeggbyqj/site.standard.publication/3mnl6f7ob462z",
"tags": [
"Python",
"TIL",
"Performance"
],
"textContent": "Unless I'm hand rolling my own ORM-like feature or validation logic, I rarely need to write\ncustom descriptors in Python. The built-in descriptor magics like @classmethod,\n@property, @staticmethod, and vanilla instance methods usually get the job done.\nHowever, every time I need to dig my teeth into descriptors, I reach for this fantastic\n[Descriptor how-to guide] by Raymond Hettinger. You should definitely set aside the time to\nread it if you haven't already. It has helped me immensely to deepen my understanding of how\nmany of the fundamental language constructs are wired together underneath.\n\nDescriptors are considered fairly advanced Python features and can easily turn into footguns\nif used carelessly. Recently, while working on an app with a descriptor-based data\nvalidator, I discovered a subtle but obvious bug that was hemorrhaging memory all across the\napp. The app was using a descriptor to validate class variables while simultaneously\ntracking instances where validation occurred. This validator was being used all over the\ncodebase, so it slowly started blowing up memory usage in the background. The problem is\nthat it was keeping hard references to everything it validated, so none of those objects\ncould get garbage collected. But the really sneaky thing was how slowly and secretly the\nproblem happened - the leakage built up bit by bit over time even when people used the\nvalidator in totally innocuous ways.\n\nHere's a simpler example of a validation descriptor that tracks the instances it's applied\nto:\n\nThe Within descriptor validates that the values assigned to instance attributes are within\na specified min and max range. It does this by implementing the __set__ and __get__\ndunder methods. When the descriptor is accessed via instance.attrname, the __get__\nmethod is called which returns the value from the instance's dict. When a value is assigned\nvia instance.attrname = value, the __set__ method is called which validates the value is\nwithin the min/max bounds before setting it on the instance. A memory leak occurs because\nthe _seen dict keeps a reference to every instance the descriptor has been accessed on.\nThis prevents the instances from being garbage collected even if there are no other\nreferences to them. You can use the descriptor and observe the memory leakage like this:\n\nHere, we're defining an Exam class that uses the Within descriptor to apply constraints\non the values of the math, physics, and chemistry class variables. Then we initialize\nthe class instance and mutate the math attribute to demonstrate that the validator is\nworking as expected. The instance of the Exam class is saved to the _seen dictionary of\nthe descriptor when the __set__ method is called. Next, we delete the Exam instance and\nforce garbage collection. However, when you run the snippet, you'll see that it prints the\nfollowing:\n\nThis indicates that although we've deleted the Exam instance, it can't be fully garbage\ncollected since the Within descriptor's _seen dictionary holds a strong reference to it.\n\nDispel the malady\n\nOnce I spotted the bug, the solution was fairly simple. Don't keep strong references to the\nclass instances if you don't need to. Also, use a more robust tool like [Pydantic] to\nperform validation but I digress here! Using a weakref.WeakKeyDictionary instead of a\nregular dict for _seen would prevent the memory leakage by avoiding strong references to\nthe deleted instances. Since WeakKeyDictionary holds weak references to the keys, if all\nother strong references to an instance are deleted, the garbage collector can reclaim it.\nThe weak reference in WeakKeyDictionary won't keep the instance alive. Here's how you'd\nmodify Within to fix the issue:\n\nThe modified descriptor is a drop-in replacement for the previous one - minus the memory\nleakage issue. So in the last snippet, when exam is deleted and the gc is called, weakref\nallows the instance to be garbage collected correctly instead of remaining in memory due to\nthe strong reference in _seen. The weak reference doesn't interfere with gc freeing up the\nmemory as desired. If you run the demonstration snippet again, this time you'll see that\nonce we force the gc to collect the garbage, the _seen container gets emptied out.\n\nThis will print an empty tuple:\n\nThis also means that now Within will only keep track of instances that are alive in\nmemory.\n\n\n\n\n[descriptor how-to guide]:\n https://docs.python.org/3/howto/descriptor.html\n\n[pydantic]:\n https://docs.pydantic.dev/latest/",
"title": "Memory leakage in Python descriptors"
}