External Publication
Visit Post

MemEval & PropMem: Standardizing Agent Memory Benchmarks

Hugging Face Forums [Unofficial] March 11, 2026
Source
Reproducing results across AI agent memory systems is hard, different LLMs, embeddings, token budgets, and scoring methods make comparisons almost meaningless. We built MemEval, an open-source benchmark that evaluates memory systems under standardized conditions and tracks token efficiency. While benchmarking, we discovered recurring failure modes, which led to PropMem, a factual memory system designed to address them efficiently. Both projects are Open Source: ready for evaluation, extension, or collaboration. Try it out: * Blog: https://medium.com/prosus-ai-tech-blog/memeval-benchmarking-memory-for-ai-agents-932d3fd9f3b4 * Code: GitHub - ProsusAI/MemEval: Benchmark suite for evaluating agent and LLM memory systems · GitHub We would love to hear how the community benchmarks or improves agent memory systems!

Discussion in the ATmosphere

Loading comments...