Using tqdm with concurrent.fututes in Python

Redowan Delowar January 6, 2023
Source

At my workplace, I was writing a script to download multiple files from different S3 buckets. The script relied on Django ORM, so I couldn't use Python's async paradigm to speed up the process. Instead, I opted for boto3 to download the files and concurrent.futures.ThreadPoolExecutor to spin up multiple threads and make the requests concurrently.

However, since the script was expected to be long-running, I needed to display progress bars to show the state of execution. It's quite easy to do with tqdm when you're just looping over a list of file paths and downloading the contents synchronously:

But you can't do this when multiple threads or processes are doing the work. Here's what I've found that works quite well:

Running this will print:

This script makes 5 concurrent requests by leveraging ThreadPoolExecutor from the concurrent.futures module. The make_request function just sends one request to a URL and sleeps for a second to simulate a long-running task. Then the make_requests function spins up 5 threads and calls the make_request function in each one with a different URL.

Here, we're instantiating tqdm as a context manager and passing the total length of the urls. This allows tqdm to calculate the progress bar. Then in a nested context manager, we spin up the threads and pass the make_request to the executor.submit method. We collect the future objects returned by the executor.submit methods in a list and update the progress bar with pbar.update(1) while iterating through the futures. And that's it, mission successful.

I usually use contextlib.ExitStack to avoid nested context managers like this:

Running this script will yield the same result as before.

Further reading

Discussion in the ATmosphere

Loading comments...