External Publication

FAISS + LMDB RAG on a 50-year corpus works great — until you ask ‘what happened in 2020?’ (time-aware retrieval problem)

Hugging Face Forums [Unofficial] March 24, 2026

I’m working on a RAG system over a long-span archive (~50 years), and the current retrieval stack performs well for general semantic queries.

However, I’m struggling with time-constrained queries where users implicitly or explicitly expect results from a specific period.

Example query:

“What happened to XYZ political party during the 2020 election?”

The system retrieves semantically relevant content about the entity, but fails to prioritize results within the intended time window, even when increasing K.

System setup

Data

Corpus: ~1.6M documents → ~5M vectors after chunking
Language: Non-English

Embeddings

Model: LaBSE - 768-dim, L2-normalized (cosine / inner product)
Chunks: cleaned text segments (noise-reduced)

Index & storage

FAISS IVFPQ (primary ANN index)
Raw vectors stored in memmap (used for exact rerank on candidates)
Docstore: LMDB (ID → chunk + metadata)

Retrieval pipeline

Query → decomposition → main query

→ LaBSE embedding → normalized vector

→ FAISS IVFPQ → top-K candidate IDs

→ memmap → exact dot-product rerank → top-N

→ LMDB → fetch chunks + metadata

→ cross-encoder reranker → final scoring

Performance

Recall@5 ≈ 80% (acceptable for general queries)

Query decomposition & temporal signals

Structured signals are extracted from queries, including timeline (explicit years, relative dates normalized to ranges) and main intent.

Each document chunk also contains date metadata , accessible at retrieval time.

However, retrieval currently uses a cleaned entity-focused query , where timeline cues are intentionally removed to improve semantic matching.

Even though both query-side time constraints and document-side timestamps are available, they are not incorporated during candidate generation , which remains purely semantic.

What I tried / considered

1. Increasing K and Post-retrieval filtering based on

Issue:
- Still diluted across decades
- Not reliable for narrow time windows
- Risk of losing true relevant docs

2. Time-based sharding (design idea)

Split vector store into year-wise (or period-wise) shards
Route query to relevant shard(s)
Maintain one global store for generic queries Issues:
Requires deeper changes in retrieval + reranking flow
Operational overhead (multiple indices)

Question

How do production systems typically handle:
- “entity + time window” queries at scale
- without sacrificing recall or blowing up latency?
Is pre-filtering (via sharding or partitioned indices) generally preferred over post-filtering for time-constrained queries?

Query decomposition & temporal signals

Discussion in the ATmosphere