Don't wrap instance methods with 'functools.lru_cache' decorator in Python

Redowan Delowar January 15, 2022
Source

Recently, fell into this trap as I wanted to speed up a slow instance method by caching it.

When you decorate an instance method with functools.lru_cache decorator, the instances of the class encapsulating that method never get garbage collected within the lifetime of the process holding them.

Let's consider this example:

Here, I've created a simple SlowAdder class that accepts a delay value; then it sleeps for delay seconds and calculates the sum of the inputs in the calculate method. To avoid this slow recalculation for the same arguments, the calculate method was wrapped in the lru_cache decorator. The del method notifies us when the garbage collection has successfully cleaned up instances of the class.

If you run this program, it'll print this:

You can see that the lru_cache decorator is doing its job. The second call to the calculate method with the same argument took noticeably less time compared to the first one. In the second case, the lru_cache decorator is just doing a simple dictionary lookup. This is all good but the instances of the ShowAdder class never get garbage collected in the lifetime of the program. Let's prove that in the next section.

Garbage collector can't clear up the affected instances

If you execute the above snippet with an -i flag, we can interactively prove that no garbage collection takes place. Let's do it:

Here on the REPL, you can see that I've reassigned slow_adder to None and then explicitly triggered the garbage collector. However, we don't see the message in the del method printed here and the output of gc.collect() is 0. This implies that something is holding a reference to the slow_adder instance and the garbage collector can't clear up the object. Let's inspect who has that reference:

The cache_info() is showing that the cache container keeps a reference to the instance until it gets cleared. When I manually cleared the cache and reassigned the variable slow_adder to None, only then did the garbage collector remove the instance. By default, the size of the lru_cache is 128 but if I had applied lru_cache(maxsize=None), that would've kept the cache forever and the garbage collector would wait for the reference count to drop to zero but that'd never happen within the lifetime of the process.

This can be dangerous if you create millions of instances and they don't get garbage collected naturally. It can overflow your working memory and cause the process to crash! I accidentally did it where the infected class was being instantiated millions of times via HTTP API requests.

The solution

To solve this, we'll have to make the cache containers local to the instances so that the reference from cache to the instance gets scraped off with the instance. Here's how you can do that:

The only difference here is - instead of decorating the method directly, I called the decorator function on the _calculate method just as a regular function and saved the result as an instance variable named calculate. The instances of this class get garbage collected as usual.

Notice that this time, clearing out the cache wasn't necessary. I had to call gc.collect() to invoke explicit garbage collection. That's because this shenanigan creates cyclical references and the GC needs to do some special magic to clear the memory. In real code, Python interpreter will clean this up for you in the background without you having to call the GC.

The self dilemma

Even after applying the solution above, a weird thing happens in the case of instance methods. Let's run the src_2.py script interactively to demonstrate that:

Here, I've created another instance of the SlowAdder class and called calculate with the same arguments. But whenever I called the calculate method on the slow_adder_2 instance with the same parameters as before, the first time, it recalculated it instead of returning the result from the cache. How come!

Underneath, the lru_cache decorator uses a dictionary to cache the calculated values. A hash function is applied to all the parameters of the target function to build the key of the dictionary, and the value is the return value of the function when those parameters are the inputs. This means, the first argument self also gets included while building the cache key. However, for different instances, this self object is going to be different and that makes the hashed key of the cache different for every instance even if the other parameters are the same.

But what about class methods & static methods

Class methods and static methods don't suffer from the above issues as they don't have any ties to their respective instances. In their case, the cache container is local to the class, not the instances. Here, you can stack the lru_cache decorator as usual. Let's demonstrate that for classmethod first:

You can inspect the garbage collection behavior here:

Static methods behave exactly the same. You can use the lru_cache decorator in similar fashion as below:

Further reading

Discussion in the ATmosphere

Loading comments...