External Publication
Visit Post

Found the fix for memory not being freed when switching models on Linux (it's not Python or PyTorch)

Hugging Face Forums [Unofficial] March 29, 2026
Source

Interesting finding, but from reading the doc, I don’t see why glibc should be using sbrk to satisfy these allocations.

              The upper limit
              is DEFAULT_MMAP_THRESHOLD_MAX: [...] 4*1024*1024*sizeof(long) on 64-bit systems.

              Note: Nowadays, glibc uses a dynamic mmap threshold by
              default.  The initial value of the threshold is 128*1024,
              but when blocks larger than the current threshold and less
              than or equal to DEFAULT_MMAP_THRESHOLD_MAX are freed, the
              threshold is adjusted upward to the size of the freed
              block.  When dynamic mmap thresholding is in effect, the
              threshold for trimming the heap is also dynamically
              adjusted to be twice the dynamic mmap threshold.  Dynamic
              adjustment of the mmap threshold is disabled if any of the
              M_TRIM_THRESHOLD, M_TOP_PAD, M_MMAP_THRESHOLD, or
              M_MMAP_MAX parameters is set.

On x86-64, sizeof(long) is 4, so any allocation larger than 16 MiB should be over the limit and thus be satisfied with mmap, right? Unless it happens to fit within some already-allocated arena’s free list, but in that case it wouldn’t increase the process’s size.

Evidently something with Torch’s allocation pattern interacts pathologically, but I’m not sure what.

That said, why is Torch even mallocing memory to store the model, rather than simply mmapping the model file into the process’s address space? That would avoid the whole issue and also save a pointless memory copy.

Discussion in the ATmosphere

Loading comments...