Native binary embeddings experiment: curious about your thoughts
Thanks a lot for your time and your responses, really helpful for someone still learning.
One point I want to clarify on the bit budget comparison, because I think it reframes things a bit:
- Float 384-dim = 384 × 32 = 12 288 bits
- Post-hoc binary 384-dim = 384 bits
- Native binary 2048-dim = 2 048 bits
So post-hoc binary isn’t a middle ground — it’s an extreme compression that shows clearly in the results (Recall@10 drops from 0.313 → 0.236, −25%). Native binary 2048 sits between the two: 6× fewer bits than float, but 5× more than post-hoc, and it recovers meaningful recall (0.276) while being 12× faster on CPU.
The goal of the experiment isn’t really to prove native binary beats post-hoc at equal bit budget. It’s more basic than that: can you trade weight precision for more dimensions and still get useful retrieval on CPU-only hardware? The results suggest yes, and 2048-dim seems to be a reasonable sweet spot for that trade-off.
Your suggestions on the dimension sweep (512, 1024) and bit diagnostics are genuinely useful next steps — I’ll add them, and I will also add the Q4 quantization on float 384 to compare.
The equal-bit ablation is also interesting, but it’s a different research question than what I was trying to answer here.
Appreciate the BPR/JPQ/RepCONC references, I wasn’t familiar with all of them.
I’ll keep you posted.
Discussion in the ATmosphere