TR26-086 | A Note on Second-Order Expected Maximum-Load Bounds for Binary Linear Hashing | Nader Bshouty
Let $S\subseteq {\mathbb F}_2^u$ have size $n=2^\ell$, and let $h:{\mathbb F}_2^u\to {\mathbb F}_2^\ell$ be a uniformly random linear map. For $y\in{\mathbb F}_2^\ell$, write ${load}h(y):=|h^{-1}(y)\cap S|$, and let $M(S,h):=\max{y\in{\mathbb F}_2^\ell}\{load}_h(y)$ be the maximum load. Jaber, Kumar and Zuckerman (STOC 2025) proved that the expected maximum load of $h$ on $S$ is at most $16\log n/\log\log n$, matching the fully independent keys-into-bins scale up to constants. Their proof also gives the tail estimate \[ \Pr\left[ M(S,h)\ge R\frac{\log n}{\log\log n} \right] \le O\left(\frac{1}{R^{2}}\right). \] We record a base optimization in their exponential-potential method showing that binary linear hashing nearly matches fully independent hashing also at the level of the second-order maximum-load scale. For every $R>1$ satisfying $R\ell^{1-1/R}\ge D\ln\ell$, where $D$ is an absolute constant, we prove \[ \Pr\left[ M(S,h)\ge R\frac{\log n}{\log\log n} \right] \le O\left( \frac{(\log\log n)^2}{R^2(\log n)^{2-2/R}} \right). \] Integrating this tail yields \[ {\mathbb E}[M(S,h)] \le \left( 1+ (1+o(1)) \frac{\log\log\log n}{\log\log n} \right) \frac{\log n}{\log\log n}. \] Thus binary linear hashing matches fully independent hashing in the leading term and matches the dominant second-order correction up to a $1+o(1)$ factor. We also prove, by an independent self-contained argument, a sharp tail bound for one prescribed bucket: for fixed $y\in{\mathbb F}_2^\ell$, \[ \Pr[{load}h(y)>2^a-2]\le \gamma^{-1}2^{-a^2}, \] where $\gamma=\prod{j\ge1}(1-2^{-j})$. A subspace construction shows that this is asymptotically tight even in the leading constant as $a\to\infty$. However, this controls only a fixed bucket; a direct union bound over all buckets loses a factor $2^\ell$.
Discussion in the ATmosphere