External Publication
Visit Post

Native binary embeddings experiment: curious about your thoughts

Hugging Face Forums [Unofficial] June 24, 2026
Source

Hmm…? I found quite a few things for now:


My read is: this looks less like an isolated “binary embedding trick” and more like a small, reproducible experiment in learned binary retrieval / compression-aware dense retrieval.

So I would answer your question roughly like this:

yes, I think this is a worthwhile direction, but I would be careful about what the current result proves. It is promising evidence that metric-aligned native binary training can recover quality compared with simple post-hoc binarization, but it is not yet a clean proof that native binary always beats post-hoc binary under the same bit, memory, or latency budget.

The interesting part, to me, is not only that Hamming search is fast. FAISS/vector-DB practice already makes that part plausible. The more interesting question is whether training the representation for the compressed/discrete search space gives a better quality-memory-latency tradeoff than current post-hoc binary/int8/PQ-style baselines.

Short version

I would frame this as:

a small SentenceTransformer-style experiment in learned binary retrieval.

The closest reference point I found is probably Binary Passage Retriever / BPR , because it uses learned binary codes for efficient candidate generation and continuous vectors for reranking:

ACL Anthology

Efficient Passage Retrieval with Hashing for Open-domain Question Answering

Ikuya Yamada, Akari Asai, Hannaneh Hajishirzi. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers). 2021.

There is also a broader family of related work where the retriever and the compressed/discrete representation are trained together, rather than treating compression as a purely post-hoc step:

  • JPQ: [2108.00644] Jointly Optimizing Query Encoder and Product Quantization to Improve Retrieval Performance
  • RepCONC: [2110.05789] Learning Discrete Representations via Constrained Clustering for Effective and Efficient Dense Retrieval
  • MoPQ: Matching-oriented Embedding Quantization For Ad-hoc Retrieval - ACL Anthology
  • Distill-VQ: [2204.00185] Distill-VQ: Learning Retrieval Oriented Vector Quantization By Distilling Knowledge from Dense Embeddings

And for the practical baseline, I would compare against the current Sentence Transformers / HF-style quantization path, especially binary/int8 retrieval with rescoring , not only simple sign-threshold post-hoc binary:

sbert.net

Embedding Quantization — Sentence Transformers documentation

huggingface.co

Binary and Scalar Embedding Quantization for Significantly Faster & Cheaper...

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

My main caveat

The current result is promising, but the comparison mixes two things:

Setting What changes
post-hoc binary 384-dim simple binary conversion, 384 bits
native binary 2048-dim native binary training, but also much larger bit budget
native binary 4096-dim native binary training, and even larger bit budget

So the gain might come from the native binary objective, the larger bit budget, redundancy, the projection head, the tanh/contrastive surrogate, the {-1,+1} representation, or some mixture of these.

That is not a criticism. It just means the next clean experiment is probably an equal-bit / equal-memory ablation.

A minimal next comparison could be:

Bit budget Post-hoc binary Native binary
384 yes yes
768 yes yes
1024 yes yes
2048 yes yes
4096 yes yes

If native binary still wins at the same bit budget, that would make the claim much stronger.

Suggested next smallest experiment

If you want the smallest next step that would clarify the result a lot, I would do this:

  1. Train native binary at 384, 768, 1024, and 2048 bits.
  2. Create post-hoc binary baselines at the same dimensions.
  3. Evaluate SciFact with both Recall@10 and nDCG@10.
  4. Add one or two more BEIR/MTEB retrieval tasks, for example NFCorpus or FiQA.
  5. Add one rescoring setup: binary top-100 or top-200 → float/int8 rescore → final top-10.
  6. Inspect the binary codes directly: bit entropy, bit balance, dead bits, collision rate, and positive/negative Hamming-distance histograms.

The last point is important. A 2048-bit model may not actually be using 2048 useful bits. Some bits may be dead, redundant, or heavily biased.

Longer reasoning / related work map (click for more details)

Final practical suggestion

If I had to suggest one compact next milestone, it would be:

  1. equal-bit native vs post-hoc comparison at 384/768/1024/2048;
  2. one stronger baseline: Sentence Transformers binary/int8 quantization with rescoring;
  3. one extra retrieval task beyond SciFact;
  4. bit diagnostics: entropy, balance, dead bits, collision rate, and Hamming-distance histograms;
  5. report pure binary retrieval separately from binary + rescore and end-to-end latency.

That would make the result much easier for other people to interpret and reproduce.

Overall, I think this is worth continuing. The strongest version of the claim would not be “binary is fast” or “native binary beats post-hoc once.” It would be something like:

metric-aligned native binary training can produce better first-stage retrieval codes than ordinary post-hoc binarization under a fixed memory/latency budget, and it remains useful when compared against current binary/int8 quantization + rescoring baselines.

That would be a genuinely interesting result.

Discussion in the ATmosphere

Loading comments...