{
"$type": "site.standard.document",
"canonicalUrl": "https://rednafi.com/python/disallow-large-file-download/",
"description": "Prevent excessive file downloads in Python by streaming with HTTPX and limiting file size with chunk-based validation and memory-safe processing.",
"path": "/python/disallow-large-file-download/",
"publishedAt": "2022-03-23T00:00:00.000Z",
"site": "at://did:plc:fgtm2c26vfcj74rfmeggbyqj/site.standard.publication/3mnl6f7ob462z",
"tags": [
"Python",
"API",
"Security"
],
"textContent": "I was working on a DRF POST API endpoint where the consumer is expected to add a URL\ncontaining a PDF file and the system would then download the file and save it to an S3\nbucket. While this sounds quite straightforward, there's one big issue. Before I started\nworking on it, the core logic looked like this:\n\nIn the above snippet, there's no guardrail against how large the target file can be. You\ncould bring the entire server down to its knees by posting a link to a ginormous file. The\nserver would be busy downloading the file and keep consuming resources.\n\nI didn't want to use urllib at all for this purpose and went for [HTTPx]. It exposes a\nneat API to perform streaming file download. Also, I didn't want to peek into the\nContent-Length header to assess the file size since the file server can choose not to\ninclude that header key. I was looking for something more dependable than that. Here's how I\nsolved it:\n\nThe chunk_size parameter explicitly dictates the buffer size of the file being downloaded.\nThis means the entire file won't be loaded into memory while being downloaded. The\nmax_size parameter defines the maximum file size that'll be allowed. In this example,\nwe're keeping track of the size of the already downloaded bytes in the\ndownloaded_content_length variable and raising an error if the size exceeds 10MB. Sweet!\n\nFurther reading\n\n- [Streaming download with HTTPx]\n\n\n\n\n[httpx]:\n https://www.python-httpx.org/\n\n[streaming download with httpx]:\n https://www.python-httpx.org/advanced/#monitoring-download-progress",
"title": "Disallow large file download from URLs in Python"
}