External Publication

Native binary embeddings experiment: curious about your thoughts

Hugging Face Forums [Unofficial] June 23, 2026

I spent a few days testing a simple hypothesis: does training a binary embedding model natively (with a binary loss) produce better retrieval than just binarizing a float model post-hoc? The setup is deliberately small : bert-mini (~11M params), CPU-only training on a Mac Mini M4 Pro, NLI 550k pairs, 3 epochs. Key results on SciFact Recall@10: * Float32 384-dim: 0.313 * Post-hoc binary 384-dim: 0.236 (−25%) * Native binary 2048-dim: 0.276 (−12% vs float, but +17% vs post-hoc) * Native binary 4096-dim: 0.296 (−5% vs float, +25% vs post-hoc) And at 1M vectors with FAISS (AVX2+POPCNT on x86): * Native binary 2048-dim: 12× faster than float32, index 6× smaller The three things that made the binary model actually converge: 1. STE with {-1,+1} (not {0,1}) 2. tanh contrastive loss (aligns with the Hamming metric at eval) 3. Differential learning rate — projection head at 50× the encoder LR Models and code on GitHub / HuggingFace (korben99/bne-binary-2048). Happy to hear if you’ve seen similar or contradictory results, especially at larger scales or with bigger backbones. Also curious whether the 2048-dim sweet spot holds with e.g. MiniLM.

Discussion in the ATmosphere