Raw Record Source

{
  "$type": "site.standard.document",
  "bskyPostRef": {
    "cid": "bafyreibj3euhkw7xczkitrmva34g67uvo7ia66nf6e3r3rtejjkw3vkjoe",
    "uri": "at://did:plc:pgryn3ephfd2xgft23qokfzt/app.bsky.feed.post/3mi6nyye5umz2"
  },
  "path": "/t/found-the-fix-for-memory-not-being-freed-when-switching-models-on-linux-its-not-python-or-pytorch/174750#post_1",
  "publishedAt": "2026-03-29T04:18:00.000Z",
  "site": "https://discuss.huggingface.co",
  "tags": [
    "GitHub - brjen/pytorch-memory-fix: Two environment variables that fix PyTorch/glibc memory creep on Linux. Zero code changes. Zero performance cost. · GitHub"
  ],
  "textContent": "Hey all — sharing something that might save a lot of people a lot of headaches.\n\nIf you’ve ever switched between models in a long-running Python process and watched your RAM climb until it eventually OOMs — even after doing `del model`, `gc.collect()`, and `torch.cuda.empty_cache()` — the problem isn’t in your code, and it’s not a PyTorch bug. It’s the C memory allocator underneath.\n\nOn Linux, glibc’s `malloc` uses memory arenas for large allocations. When PyTorch loads model weights, glibc grabs big chunks of memory via `brk()`/`sbrk()`. When you free those weights, Python does its part — but glibc holds onto the memory because small residual allocations pin entire arena chunks. From Python’s perspective the memory is freed. From the OS perspective, your process is still holding all of it.\n\nEvery model switch leaves behind a few more pinned arenas. Eventually you’re out of memory and the process dies.\n\n## The fix\n\n\n    export MALLOC_MMAP_THRESHOLD_=65536\n    export MALLOC_TRIM_THRESHOLD_=65536\n\n\nSet these before launching Python. This tells glibc to use `mmap()` for allocations over 64KB instead of arenas. `mmap` pages go straight back to the OS when freed — no fragmentation, no pinning.\n\n## How I found it\n\nI run a render engine (diffusers/FastAPI) that switches between 13 checkpoints (SDXL, Flux, PixArt, SD 1.5, Playground v2.5). It would OOM after about 17 hours and 107 model switches. I tried every Python-level fix I could find — nothing worked. Took a step back and started looking at what was happening below Python, and glibc’s arena allocator turned out to be the culprit.\n\n## Results\n\n  * **Before:** RSS grew ~3GB per model switch, OOM after 17 hours\n  * **After:** RSS flat at 955MB across 107 consecutive model switches\n  * Tested on AMD RX 7800 XT (ROCm) and NVIDIA GTX 1080 Ti (CUDA)\n\n\n\nThis applies to transformers, diffusers, or any PyTorch workload on Linux where you load and unload models in a long-running process. No code changes needed — just the two environment variables.\n\nI’ve posted this on a few related threads (but wanted to share it here as well since a lot of people seem to be hitting this.\n\nFull write-up with methodology and proof data: GitHub - brjen/pytorch-memory-fix: Two environment variables that fix PyTorch/glibc memory creep on Linux. Zero code changes. Zero performance cost. · GitHub\n\nHope it helps!",
  "title": "Found the fix for memory not being freed when switching models on Linux (it's not Python or PyTorch)"
}