{
  "$type": "site.standard.document",
  "bskyPostRef": {
    "cid": "bafyreiagledt7s7op3hzcuwersx3bg46fim4uqekmvooc2fhhm76vcdmdu",
    "uri": "at://did:plc:pgryn3ephfd2xgft23qokfzt/app.bsky.feed.post/3mi7c5ljhkuo2"
  },
  "path": "/t/found-the-fix-for-memory-not-being-freed-when-switching-models-on-linux-its-not-python-or-pytorch/174750#post_3",
  "publishedAt": "2026-03-29T13:36:03.000Z",
  "site": "https://discuss.huggingface.co",
  "textContent": "Great question, the key thing is PyTorch doesn’t make one big malloc per model. It makes hundreds of small ones, one per tensor. Most are a few hundred KB to a few MB — nowhere near 16MiB.\n\nThe problem is the dynamic threshold creeping upward. Starts at 128KB, and every time a bigger mmap’d chunk gets freed, glibc raises the bar. After enough model switches, allocations that used to get mmap’d are now landing in arenas and fragmenting.\n\nSetting `MALLOC_MMAP_THRESHOLD_=65536` locks it at 64KB and kills the dynamic adjustment entirely — the docs even say that: setting the parameter disables auto-tuning. So everything over 64KB goes through mmap and gets cleanly returned to the OS.\n\nOn mmap loading — safetensors supports it, but PyTorch still mallocs for dtype conversion, GPU staging buffers, optimizer state, etc. Those intermediate allocations are what fragment the arenas, not the weight file read.\n\nAlso small note — sizeof(long) is 8 on x86-64 Linux (LP64), so the ceiling is 32MiB not 16. Doesn’t change anything since individual tensor allocs are way under either number.",
  "title": "Found the fix for memory not being freed when switching models on Linux (it's not Python or PyTorch)"
}