FAISS + LMDB RAG on a 50-year corpus works great — until you ask ‘what happened in 2020?’ (time-aware retrieval problem)
I’m working on a RAG system over a long-span archive (~50 years), and the current retrieval stack performs well for general semantic queries.
However, I’m struggling with time-constrained queries where users implicitly or explicitly expect results from a specific period.
Example query:
“What happened to XYZ political party during the 2020 election?”
The system retrieves semantically relevant content about the entity, but fails to prioritize results within the intended time window, even when increasing K.
System setup
Data
Corpus: ~1.6M documents → ~5M vectors after chunking
Language: Non-English
Embeddings
Model: LaBSE - 768-dim, L2-normalized (cosine / inner product)
Chunks: cleaned text segments (noise-reduced)
Index & storage
FAISS IVFPQ (primary ANN index)
Raw vectors stored in memmap (used for exact rerank on candidates)
Docstore: LMDB (ID → chunk + metadata)
Retrieval pipeline
Query → decomposition → main query
→ LaBSE embedding → normalized vector
→ FAISS IVFPQ → top-K candidate IDs
→ memmap → exact dot-product rerank → top-N
→ LMDB → fetch chunks + metadata
→ cross-encoder reranker → final scoring
Performance
- Recall@5 ≈ 80% (acceptable for general queries)
Query decomposition & temporal signals
Structured signals are extracted from queries, including timeline (explicit years, relative dates normalized to ranges) and main intent.
Each document chunk also contains date metadata , accessible at retrieval time.
However, retrieval currently uses a cleaned entity-focused query , where timeline cues are intentionally removed to improve semantic matching.
Even though both query-side time constraints and document-side timestamps are available, they are not incorporated during candidate generation , which remains purely semantic.
What I tried / considered
1. Increasing K and Post-retrieval filtering based on
Issue:
Still diluted across decades
Not reliable for narrow time windows
Risk of losing true relevant docs
2. Time-based sharding (design idea)
Split vector store into year-wise (or period-wise) shards
Route query to relevant shard(s)
Maintain one global store for generic queries Issues:
Requires deeper changes in retrieval + reranking flow
Operational overhead (multiple indices)
Question
How do production systems typically handle:
“entity + time window” queries at scale
without sacrificing recall or blowing up latency?
Is pre-filtering (via sharding or partitioned indices) generally preferred over post-filtering for time-constrained queries?
Discussion in the ATmosphere