Found the fix for memory not being freed when switching models on Linux (it's not Python or PyTorch)
Hugging Face Forums [Unofficial]
March 29, 2026
Interesting finding, but from reading the doc, I don’t see why glibc should be using sbrk to satisfy these allocations.
The upper limit is DEFAULT_MMAP_THRESHOLD_MAX: [...] 4*1024*1024*sizeof(long) on 64-bit systems. Note: Nowadays, glibc uses a dynamic mmap threshold by default. The initial value of the threshold is 128*1024, but when blocks larger than the current threshold and less than or equal to DEFAULT_MMAP_THRESHOLD_MAX are freed, the threshold is adjusted upward to the size of the freed block. When dynamic mmap thresholding is in effect, the threshold for trimming the heap is also dynamically adjusted to be twice the dynamic mmap threshold. Dynamic adjustment of the mmap threshold is disabled if any of the M_TRIM_THRESHOLD, M_TOP_PAD, M_MMAP_THRESHOLD, or M_MMAP_MAX parameters is set.
On x86-64, sizeof(long) is 4, so any allocation larger than 16 MiB should be over the limit and thus be satisfied with mmap, right? Unless it happens to fit within some already-allocated arena’s free list, but in that case it wouldn’t increase the process’s size.
Evidently something with Torch’s allocation pattern interacts pathologically, but I’m not sure what.
That said, why is Torch even mallocing memory to store the model, rather than simply mmapping the model file into the process’s address space? That would avoid the whole issue and also save a pointless memory copy.
Discussion in the ATmosphere