External Publication
Visit Post

Pooling in CNNs: Shrink the Map, Keep What Matters

DEV Community [Unofficial] June 20, 2026
Source

After a few conv layers you're drowning in feature maps โ€” too big and slow. Pooling shrinks them while keeping the signal, and it has zero parameters. Four numbers in, one out.

๐ŸชŸ Max vs average pooling: https://dev48v.infy.uk/dl/day9-pooling.html

The operation

for (let y = 0; y < h; y += 2)        // stride 2: hop, don't slide
  for (let x = 0; x < w; x += 2) {
    const win = [img[y][x], img[y][x+1], img[y+1][x], img[y+1][x+1]];
    out[y/2][x/2] = reduce(win);       // 2ร—2 โ†’ 1 value, halves both dims
  }

Max pool โ€” keep the strongest

const reduce = (w) => Math.max(...w);

A feature detector asks "is this feature here?" โ€” max pooling answers "yes, somewhere in this region it fired", keeping the feature's presence while discarding its exact location.

Average pool โ€” smooth it

const reduce = (w) => w.reduce((a,b)=>a+b) / w.length;

Common at the end of a network (global average pooling) to summarise each map before the classifier.

The real point: translation tolerance

Because pooling reports "feature present in region", shifting the input a pixel barely changes the output. That invariance is why a CNN recognises a cat whether it's top-left or centre.

And it's free โ€” no learnable weights. (Modern nets sometimes use strided convolutions instead, but the intent is identical.)

The takeaway

Hop in 2ร—2s, keep the max โ†’ smaller maps, position-tolerant, zero params. Pool a grid.

Discussion in the ATmosphere

Loading comments...