External Publication
Visit Post

Scaling Agentic Memory to 5 Billion Vectors via Binary Quantization and Dynamic Wavelet Matrices

Hugging Face Forums [Unofficial] April 4, 2026
Source

In a study, a new “dynamic wavelet matrix” was used as a vector database, where the memory grows only with log(σ) instead of with n. I considered building a KNN model with a huge memory, capable of holding, for example, 5 billion vectors.

First, the words in the context window are converted into an embedding using deberta-v3-small. This is a fast encoder that also takes the position of the tokens into account (disentangled attention) and is responsible for the context in the model.

The embedding is then converted into a bit sequence using binary quantization, where dimensions greater than 0 are converted to 1 and otherwise to 0.

The advantage is that bit sequences are compressible and are entered into the dynamic wavelet matrix, where the memory grows only with log(σ). A response token is added to each element as its content.

During response generation, the context window is converted into an embed and compared to the elements in the matrix using a Hemming-Ball distance. The response token from the element with the smallest distance is added to the context window, and the process iterates several times until the response is long enough.

arXiv.org

Hippocampus: An Efficient and Scalable Memory Module for Agentic AI

Agentic AI require persistent memory to store user-specific histories beyond the limited context window of LLMs. Existing memory systems use dense vector databases or knowledge-graph traversal (or hybrid), incurring high retrieval latency and poor...

Discussion in the ATmosphere

Loading comments...