{
"$type": "site.standard.document",
"bskyPostRef": {
"cid": "bafyreibztttqlqz32rdcuebhfgzxpg6liqkp66jokile7w2nbcxt242ipi",
"uri": "at://did:plc:pgryn3ephfd2xgft23qokfzt/app.bsky.feed.post/3mozpktqknfz2"
},
"path": "/t/native-binary-embeddings-experiment-curious-about-your-thoughts/177107#post_2",
"publishedAt": "2026-06-24T09:18:14.000Z",
"site": "https://discuss.huggingface.co",
"tags": [
"ACL Anthology",
"Efficient Passage Retrieval with Hashing for Open-domain Question Answering",
"[2108.00644] Jointly Optimizing Query Encoder and Product Quantization to Improve Retrieval Performance",
"[2110.05789] Learning Discrete Representations via Constrained Clustering for Effective and Efficient Dense Retrieval",
"Matching-oriented Embedding Quantization For Ad-hoc Retrieval - ACL Anthology",
"[2204.00185] Distill-VQ: Learning Retrieval Oriented Vector Quantization By Distilling Knowledge from Dense Embeddings",
"sbert.net",
"Embedding Quantization — Sentence Transformers documentation",
"huggingface.co",
"Binary and Scalar Embedding Quantization for Significantly Faster & Cheaper...",
"(click for more details)"
],
"textContent": "Hmm…? I found quite a few things for now:\n\n* * *\n\nMy read is: this looks less like an isolated “binary embedding trick” and more like a small, reproducible experiment in **learned binary retrieval / compression-aware dense retrieval**.\n\nSo I would answer your question roughly like this:\n\n> yes, I think this is a worthwhile direction, but I would be careful about what the current result proves. It is promising evidence that metric-aligned native binary training can recover quality compared with simple post-hoc binarization, but it is not yet a clean proof that native binary always beats post-hoc binary under the same bit, memory, or latency budget.\n\nThe interesting part, to me, is not only that Hamming search is fast. FAISS/vector-DB practice already makes that part plausible. The more interesting question is whether **training the representation for the compressed/discrete search space** gives a better quality-memory-latency tradeoff than current post-hoc binary/int8/PQ-style baselines.\n\n## Short version\n\nI would frame this as:\n\n> a small SentenceTransformer-style experiment in learned binary retrieval.\n\nThe closest reference point I found is probably **Binary Passage Retriever / BPR** , because it uses learned binary codes for efficient candidate generation and continuous vectors for reranking:\n\nACL Anthology\n\n### Efficient Passage Retrieval with Hashing for Open-domain Question Answering\n\nIkuya Yamada, Akari Asai, Hannaneh Hajishirzi. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers). 2021.\n\nThere is also a broader family of related work where the retriever and the compressed/discrete representation are trained together, rather than treating compression as a purely post-hoc step:\n\n * JPQ: [2108.00644] Jointly Optimizing Query Encoder and Product Quantization to Improve Retrieval Performance\n * RepCONC: [2110.05789] Learning Discrete Representations via Constrained Clustering for Effective and Efficient Dense Retrieval\n * MoPQ: Matching-oriented Embedding Quantization For Ad-hoc Retrieval - ACL Anthology\n * Distill-VQ: [2204.00185] Distill-VQ: Learning Retrieval Oriented Vector Quantization By Distilling Knowledge from Dense Embeddings\n\n\n\nAnd for the practical baseline, I would compare against the current Sentence Transformers / HF-style quantization path, especially binary/int8 retrieval **with rescoring** , not only simple sign-threshold post-hoc binary:\n\nsbert.net\n\n### Embedding Quantization — Sentence Transformers documentation\n\nhuggingface.co\n\n### Binary and Scalar Embedding Quantization for Significantly Faster & Cheaper...\n\nWe’re on a journey to advance and democratize artificial intelligence through open source and open science.\n\n## My main caveat\n\nThe current result is promising, but the comparison mixes two things:\n\nSetting | What changes\n---|---\npost-hoc binary 384-dim | simple binary conversion, 384 bits\nnative binary 2048-dim | native binary training, but also much larger bit budget\nnative binary 4096-dim | native binary training, and even larger bit budget\n\nSo the gain might come from the native binary objective, the larger bit budget, redundancy, the projection head, the tanh/contrastive surrogate, the `{-1,+1}` representation, or some mixture of these.\n\nThat is not a criticism. It just means the next clean experiment is probably an **equal-bit / equal-memory ablation**.\n\nA minimal next comparison could be:\n\nBit budget | Post-hoc binary | Native binary\n---|---|---\n384 | yes | yes\n768 | yes | yes\n1024 | yes | yes\n2048 | yes | yes\n4096 | yes | yes\n\nIf native binary still wins at the same bit budget, that would make the claim much stronger.\n\n## Suggested next smallest experiment\n\nIf you want the smallest next step that would clarify the result a lot, I would do this:\n\n 1. Train native binary at 384, 768, 1024, and 2048 bits.\n 2. Create post-hoc binary baselines at the same dimensions.\n 3. Evaluate SciFact with both Recall@10 and nDCG@10.\n 4. Add one or two more BEIR/MTEB retrieval tasks, for example NFCorpus or FiQA.\n 5. Add one rescoring setup: binary top-100 or top-200 → float/int8 rescore → final top-10.\n 6. Inspect the binary codes directly: bit entropy, bit balance, dead bits, collision rate, and positive/negative Hamming-distance histograms.\n\n\n\nThe last point is important. A 2048-bit model may not actually be using 2048 useful bits. Some bits may be dead, redundant, or heavily biased.\n\nLonger reasoning / related work map (click for more details)\n\n## Final practical suggestion\n\nIf I had to suggest one compact next milestone, it would be:\n\n 1. equal-bit native vs post-hoc comparison at 384/768/1024/2048;\n 2. one stronger baseline: Sentence Transformers binary/int8 quantization with rescoring;\n 3. one extra retrieval task beyond SciFact;\n 4. bit diagnostics: entropy, balance, dead bits, collision rate, and Hamming-distance histograms;\n 5. report pure binary retrieval separately from binary + rescore and end-to-end latency.\n\n\n\nThat would make the result much easier for other people to interpret and reproduce.\n\nOverall, I think this is worth continuing. The strongest version of the claim would not be “binary is fast” or “native binary beats post-hoc once.” It would be something like:\n\n> metric-aligned native binary training can produce better first-stage retrieval codes than ordinary post-hoc binarization under a fixed memory/latency budget, and it remains useful when compared against current binary/int8 quantization + rescoring baselines.\n\nThat would be a genuinely interesting result.",
"title": "Native binary embeddings experiment: curious about your thoughts"
}