{
"$type": "site.standard.document",
"bskyPostRef": {
"cid": "bafyreidxphv2son7jykppubcf2wr2gg3bxf6g4qk2gdexek2wdgq5wtfxq",
"uri": "at://did:plc:pgryn3ephfd2xgft23qokfzt/app.bsky.feed.post/3moxt4e7vcyy2"
},
"path": "/t/native-binary-embeddings-experiment-curious-about-your-thoughts/177107#post_1",
"publishedAt": "2026-06-23T15:34:41.000Z",
"site": "https://discuss.huggingface.co",
"textContent": "I spent a few days testing a simple hypothesis: does training a binary embedding model natively (with a binary loss) produce better retrieval than just binarizing a float model post-hoc?\n\nThe setup is deliberately small : bert-mini (~11M params), CPU-only training on a Mac Mini M4 Pro, NLI 550k pairs, 3 epochs.\n\nKey results on SciFact Recall@10:\n\n * Float32 384-dim: 0.313\n * Post-hoc binary 384-dim: 0.236 (−25%)\n * Native binary 2048-dim: 0.276 (−12% vs float, but +17% vs post-hoc)\n * Native binary 4096-dim: 0.296 (−5% vs float, +25% vs post-hoc)\n\n\n\nAnd at 1M vectors with FAISS (AVX2+POPCNT on x86):\n\n * Native binary 2048-dim: 12× faster than float32, index 6× smaller\n\n\n\nThe three things that made the binary model actually converge:\n\n 1. STE with {-1,+1} (not {0,1})\n 2. tanh contrastive loss (aligns with the Hamming metric at eval)\n 3. Differential learning rate — projection head at 50× the encoder LR\n\n\n\nModels and code on GitHub / HuggingFace (korben99/bne-binary-2048).\n\nHappy to hear if you’ve seen similar or contradictory results, especially at larger scales or with bigger backbones. Also curious whether the 2048-dim sweet spot holds with e.g. MiniLM.",
"title": "Native binary embeddings experiment: curious about your thoughts"
}