Raw Record Source

{
  "$type": "site.standard.document",
  "content": {
    "$type": "site.standard.content.markdown",
    "text": "As a data engineer, one of the most common tasks I perform is getting data from an API. For a long time, I've been using the `requests` library to make these requests.\n\nHowever, I recently discovered the `httpx` library, which has a built-in support for asynchronous requests. At the same time, I've worked on a couple of projects that required a smarter approach than just making sequential requests, and I worked on [abatcher](https://github.com/davidgasquez/abatcher) to abstract away some of the complexity.\n\nLet's go through multiple examples of doing 100 requests with different approaches.\n\n## Sequential Requests\n\nDoing 100 sequential requests with the `httpx` library looks like this:\n\n```python\nimport httpx\n\ndata = []\n\nwith httpx.Client() as client:\n    for i in range(100):\n        response = client.get(\"https://httpbin.org/anything\", params={\"index\": i})\n        data.append(response.json())\n\n```\n\n## Async Requests\n\nThe same thing can be done asynchronously with `httpx`'s `AsyncClient`:\n\n```python\nimport httpx\nimport asyncio\n\n\nasync with httpx.AsyncClient() as client:\n    tasks = [client.get(\"https://httpbin.org/anything\", params={\"index\": i}) for i in range(100)]\n    responses = await asyncio.gather(*tasks)\n\n    data = []\n\n    for response in responses:\n        data.append(response.json())\n```\n\nNow, doing this to a random API might not be super friendly to the API provider. In most cases, APIs have a limit on the number of requests per minute.\n\n## Async Requests with Batching\n\nThe easiest way to do this is to use the `httpx.AsyncClient` with a semaphore. This will limit the number of concurrent requests to the API at any given time.\n\n```python\nimport asyncio\nimport httpx\nfrom typing import Dict, Any\n\nBASE_URL = \"https://httpbin.org/anything\"\nMAX_BATCH_SIZE = 10\nTOTAL_REQUESTS = 100\n\n# Create semaphore once, outside the function\nsemaphore = asyncio.Semaphore(MAX_BATCH_SIZE)\n\nasync def fetch(client: httpx.AsyncClient, index: int) -> Dict[Any, Any]:\n    async with semaphore:\n        request = httpx.Request(\"GET\", BASE_URL, json={\"index\": index})\n        response = await client.send(request)\n        return response.json()\n\n# Setup client and execute requests\nlimits = httpx.Limits(max_connections=100)\nasync with httpx.AsyncClient(http2=True, limits=limits) as client:\n    print(f\"Starting batch of {TOTAL_REQUESTS} requests\")\n    tasks = [fetch(client, i) for i in range(BATCH_SIZE)]\n    results = await asyncio.gather(*tasks)\n    print(\"All requests completed\")\n```\n\nThis works pretty well and might cover most of your use cases. However, there are places where you'll be rate limited by the API provider allowing only a certain number of requests per minute.\n\n## Async Requests with Batching and Rate Limiting\n\nTo handle this, we can use the [`aiometer`](https://github.com/florimondmanca/aiometer) library which allows us to limit the number of concurrent requests.\n\nThe same 100 requests we did before, but with rate limiting looks like this (extracted from the [aiometer example in their README](https://github.com/florimondmanca/aiometer?tab=readme-ov-file#example)):\n\n```python\nimport asyncio\nimport functools\nimport random\nimport aiometer\nimport httpx\n\nclient = httpx.AsyncClient()\n\nasync def fetch(client, request):\n    response = await client.send(request)\n    return response.json()[\"json\"]\n\nrequests = [\n    httpx.Request(\"POST\", \"https://httpbin.org/anything\", json={\"index\": index})\n    for index in range(100)\n]\n\ndata = []\n\n# Send requests, and process responses as they're made available:\nasync with aiometer.amap(\n    functools.partial(fetch, client),\n    requests,\n    max_at_once=10,  # Limit maximum number of concurrently running tasks.\n    max_per_second=5,  # Limit request rate to not overload the server.\n) as results:\n    async for r in results:\n        data.append(r)\n```\n\nYou can tweak the `max_at_once` and `max_per_second` options to fine-tune concurrency!\n\n## Conclusion\n\nThe `httpx` library combined with the `aiometer` library is a great addition to your toolbelt if you're doing a lot of API requests.\n\nI've also made (alongside Cursor) a small and probably buggy Python package, [abatcher](https://github.com/davidgasquez/abatcher), with this functionality abstracted away behind a simple interface.\n\n<div style=\"text-align: center; margin: 2em 0;\">\n  <a href=\"https://github.com/davidgasquez/abatcher\">Check out abatcher on GitHub!</a>\n</div>\n\nHere's how you can use it:\n\n```python\nfrom abatcher import AsyncHttpBatcher\n\n# Create a batcher with a base URL and optional configuration\napi = AsyncHttpBatcher(\n    base_url=\"https://httpbin.org\",\n    max_concurrent=10,\n    max_per_second=5,\n    max_connections=50,\n    timeout=30,\n    retry_attempts=5,\n)\n\n# Simple GET request\nresult = api.get(\"/get\")\n\nprint(f\"Single request result: {result}\")\n\n# Batch of mixed requests\nrequests = [\n    # Simple URL\n    \"/anything\",\n    # URL with params\n    (\"/anything\", {\"query\": \"test\"}),\n    # Full configuration\n    {\n        \"url\": \"/post\",\n        \"method\": \"POST\",\n        \"params\": {\"name\": \"Test\"},\n        \"headers\": {\"X-Custom\": \"value\"},\n    },\n]\n\nresults = api.process_batch(requests)\n\nprint(f\"Batch requests results: {results}\")\n```\n\nLet me know if you have any feedback!",
    "version": "1.0"
  },
  "description": "As a data engineer, one of the most common tasks I perform is getting data from an API. For a long time, I've been using the requests library to make these requests. However, I recently discovered the httpx library, which has a built-in support for asynchronous requests. At th...",
  "path": "/async-batch-requests-python",
  "publishedAt": "2024-11-12T00:00:00.000Z",
  "site": "at://did:plc:4z5i7njrld66ew36htufcwry/site.standard.publication/3mo43d2tmt2ov",
  "textContent": "As a data engineer, one of the most common tasks I perform is getting data from an API. For a long time, I've been using the requests library to make these requests.\n\nHowever, I recently discovered the httpx library, which has a built-in support for asynchronous requests. At the same time, I've worked on a couple of projects that required a smarter approach than just making sequential requests, and I worked on abatcher to abstract away some of the complexity.\n\nLet's go through multiple examples of doing 100 requests with different approaches.\n\nSequential Requests\n\nDoing 100 sequential requests with the httpx library looks like this:\n\nAsync Requests\n\nThe same thing can be done asynchronously with httpx's AsyncClient:\n\nNow, doing this to a random API might not be super friendly to the API provider. In most cases, APIs have a limit on the number of requests per minute.\n\nAsync Requests with Batching\n\nThe easiest way to do this is to use the httpx.AsyncClient with a semaphore. This will limit the number of concurrent requests to the API at any given time.\n\nThis works pretty well and might cover most of your use cases. However, there are places where you'll be rate limited by the API provider allowing only a certain number of requests per minute.\n\nAsync Requests with Batching and Rate Limiting\n\nTo handle this, we can use the aiometer library which allows us to limit the number of concurrent requests.\n\nThe same 100 requests we did before, but with rate limiting looks like this (extracted from the aiometer example in their README):\n\nYou can tweak the maxatonce and maxpersecond options to fine-tune concurrency!\n\nConclusion\n\nThe httpx library combined with the aiometer library is a great addition to your toolbelt if you're doing a lot of API requests.\n\nI've also made (alongside Cursor) a small and probably buggy Python package, abatcher, with this functionality abstracted away behind a simple interface.\n\n  Check out abatcher on GitHub!\n\nHere's how you can use it:\n\nLet me know if you have any feedback!",
  "title": "Async Batch Requests in Python"
}