{
"$type": "site.standard.document",
"canonicalUrl": "https://rednafi.com/misc/hierarchical-rate-limiting/",
"description": "Build multi-level rate limiting with Redis sorted sets and Lua. Enforce global and category-specific limits with ZREMRANGEBYSCORE and ZCARD commands.",
"path": "/misc/hierarchical-rate-limiting/",
"publishedAt": "2025-01-12T00:00:00.000Z",
"site": "at://did:plc:fgtm2c26vfcj74rfmeggbyqj/site.standard.publication/3mnl6f7ob462z",
"tags": [
"Database",
"Python",
"System",
"Redis"
],
"textContent": "Recently at work, we ran into this problem:\n\nWe needed to send Slack notifications for specific events but had to enforce rate limits to\navoid overwhelming the channel. Here's how the limits worked:\n\n- Global limit: Max 100 requests every 30 minutes.\n- Category limit: Each event type (e.g., errors, warnings) capped at 10 requests per 30\n minutes.\n\nNow, imagine this:\n\n1. There are 20 event types.\n2. Each type hits its 10-notification limit in 30 minutes.\n3. That's 200 requests total, but the global limit only allows 100. So, 100 requests must be\n dropped - even if some event types still have room under their individual caps.\n\nThis created a hierarchy of limits:\n\n1. Category limits keep any event type from exceeding 10 requests.\n2. The global limit ensures the combined total stays under 100.\n\nEvery 30 minutes, the system resets. Here are two issues that could arise:\n\n- If some event types are busier, the global limit could block quieter ones.\n- Even with room under the global limit, some event types might still hit their category\n caps.\n\nIn our case, the event types are limited, and the category limits are both uniform and\nsignificantly smaller than the global limit, so this isn't a concern.\n\nRedis sorted sets\n\nThe notification sender service runs on multiple instances, each processing events and\nsending notifications independently. Without a shared system to enforce rate limits, these\ninstances would maintain separate counters for global and category-specific limits. This\nwould create inconsistencies because no instance would have a complete view of the overall\nactivity, leading to conflicts and potential exceedance of limits.\n\nRedis provides a centralized state that all instances can access, ensuring they share the\nsame counters for rate limits. This removes inconsistencies and makes rate limiting\nreliable, even when the notification sender scales to multiple instances.\n\nSorted sets in Redis track notifications within a rolling time window by using timestamps as\nscores, which keeps entries ordered by time. The implementation:\n\n- Maintains a global sorted set to enforce the overall limit (e.g., 100 notifications per 30\n minutes).\n- Uses category-specific sorted sets to enforce category limits for each event type (e.g.,\n 10 notifications per 30 minutes for errors, warnings, etc.).\n\nThe limits are enforced with two Redis commands:\n\n- ZREMRANGEBYSCORE removes entries with timestamps outside the rolling time window,\n keeping only recent notifications.\n- ZCARD counts the remaining entries in a set to check whether the global or\n category-specific limits have been reached.\n\nLua script\n\nInstead of embedding the rate-limiting logic directly into the notification sender, we chose\nto implement it as a Lua script in Redis. While we could write the logic in the code and run\nit in a Redis pipeline, we opted not to, for the following reasons:\n\n- A dedicated script keeps the rate-limiting logic separate and independently auditable.\n- It saves a few TCP calls, as the entire logic runs within Redis itself.\n- And most importantly, I wanted to write some Lua.\n\nThe script is as follows:\n\nThe script performs the following operations in order:\n\n1. Remove expired entries:\n - It uses ZREMRANGEBYSCORE to remove notifications older than the time window\n (current_time - window). This ensures that only active notifications are considered\n for the limits.\n\n - This eliminates the need for additional bookkeeping to remove expired keys.\n ZREMRANGEBYSCORE is fast enough to handle the removal of a small number of keys\n during each invocation.\n\n2. Check the global limit:\n - ZCARD counts the number of active notifications in the global sorted set.\n - If this count equals or exceeds the global limit (e.g., 100), the request is rejected\n (return 0).\n\n3. Check the category-specific limit:\n - ZCARD is used again to count the active notifications for the specific category.\n - If this count equals or exceeds the category limit (e.g., 10), the request is rejected\n (return 0).\n\n4. Add the notification:\n - If both limits are within bounds, the script uses ZADD to insert the current\n notification into both the global and category-specific sorted sets, using a timestamp\n as the score for accurate tracking.\n\nUsing the script\n\nYou can load the Lua script from disk, register it with Redis, and call it before invoking\nthe notification service. If the script returns 0, drop the notification request. If it\nreturns 1, send the notification. Here's how to do it in Python:\n\nRegistering the Lua script loads it from disk once and reuses it, which is faster than\nrepeatedly loading and evaluating it for each invocation.\n\nTo test this, you'll need a running Redis instance. You can run one with Docker:\n\nNow, running the script will print:\n\nSince this sends a notification only once, the rate limiting isn't apparent yet, but it's\nworking under the hood and will kick in if any limit is exceeded. To see it in action, you\ncan attempt to send multiple notifications in a tight loop.\n\nTesting the rate limiter\n\nYou can call the send_notification function multiple times to test the rate limiter. Below\nis an example that simulates several notification requests in a short loop, giving you a\nsense of how many will be allowed versus blocked:\n\nThis code demonstrates how to test the rate limiter by simulating multiple notification\nrequests. The Lua script is loaded, registered with Redis, and executed in a loop to\nevaluate whether each request is allowed or blocked based on the defined rate limits.\n\nRunning this will produce output similar to:\n\nHere, for demonstration, we set the global rate limit to 10 and the category limit to 3 with\na 60-second rolling window. After three successful category notifications (and a total of\nthree global notifications), the rate limiter rejects additional requests in the same\nwindow, illustrating how both the global and category limits work together.",
"title": "Hierarchical rate limiting with Redis sorted sets"
}