{
"$type": "site.standard.document",
"canonicalUrl": "https://rednafi.com/python/faster-bulk-update-in-django/",
"description": "Accelerate Django bulk_update operations by 4x using multiprocessing to parallelize database writes across chunked record batches.",
"path": "/python/faster-bulk-update-in-django/",
"publishedAt": "2022-11-30T00:00:00.000Z",
"site": "at://did:plc:fgtm2c26vfcj74rfmeggbyqj/site.standard.publication/3mnl6f7ob462z",
"tags": [
"Python",
"Django",
"Database"
],
"textContent": "Django has a Model.objects.bulk_update method that allows you to update multiple objects\nin a single pass. While this method is a great way to speed up the update process,\noftentimes it's not fast enough. Recently, at my workplace, I found myself writing a script\nto update half a million user records and it was taking quite a bit of time to mutate them\neven after leveraging bulk update. So I wanted to see if I could use multiprocessing with\n.bulk_update to quicken the process even more. Turns out, yep I can!\n\nHere's a script that creates 100k users in a PostgreSQL database and updates their usernames\nvia vanilla .bulk_update. Notice how we're timing the update duration:\n\nThis can be executed as a script like this:\n\nIt'll return:\n\nA little over 9 seconds isn't too bad for 100k users but we can do better. Here's how I've\nupdated the above script to make it 4x faster:\n\nThis script divides the updated user list into a list of multiple user chunks and assigns\nthat to the user_chunks variable. The update_users function takes a single user chunk\nand runs .bulk_update on that. Then we fork a bunch of processes and run the\nupdate_users function over the user_chunks via multiprocessing.Pool.map. Each process\nconsumes 10 chunks of users in a single go - determined by the chunksize parameter of\nthe pool.map function. Running the updated script will give you similar output as before\nbut with a much smaller runtime:\n\nThis will print the following:\n\nWhoa! This updated the records in under 2.5 seconds. Quite a bit of performance gain there.\n\n> This won't work if you're using SQLite database as your backend since SQLite doesn't\n> support concurrent writes from multiple processes. Trying to run the second script with\n> SQLite backend will incur a database error.\n\nFurther reading\n\n- [Django bulk_update]\n- [Using a pool of forked workers]\n\n\n\n\n[django bulk_update]:\n https://docs.djangoproject.com/en/dev/ref/models/querysets/#bulk-update\n\n[using a pool of forked workers]:\n https://docs.python.org/3/library/multiprocessing.html#using-a-pool-of-workers",
"title": "Faster bulk_update in Django"
}