{
"$type": "site.standard.document",
"bskyPostRef": {
"cid": "bafyreif4qpfyitxyzxu4oz43ky4zzg7u7ghjkrhpckg2yuc5cqs7lppe44",
"uri": "at://did:plc:pgryn3ephfd2xgft23qokfzt/app.bsky.feed.post/3mgt5nsg53ii2"
},
"path": "/t/memeval-propmem-standardizing-agent-memory-benchmarks/174203#post_1",
"publishedAt": "2026-03-11T21:35:05.000Z",
"site": "https://discuss.huggingface.co",
"tags": [
"https://medium.com/prosus-ai-tech-blog/memeval-benchmarking-memory-for-ai-agents-932d3fd9f3b4",
"GitHub - ProsusAI/MemEval: Benchmark suite for evaluating agent and LLM memory systems · GitHub"
],
"textContent": "Reproducing results across AI agent memory systems is hard, different LLMs, embeddings, token budgets, and scoring methods make comparisons almost meaningless.\n\nWe built MemEval, an open-source benchmark that evaluates memory systems under standardized conditions and tracks token efficiency. While benchmarking, we discovered recurring failure modes, which led to PropMem, a factual memory system designed to address them efficiently.\n\nBoth projects are Open Source: ready for evaluation, extension, or collaboration.\n\nTry it out:\n\n * Blog: https://medium.com/prosus-ai-tech-blog/memeval-benchmarking-memory-for-ai-agents-932d3fd9f3b4\n\n * Code: GitHub - ProsusAI/MemEval: Benchmark suite for evaluating agent and LLM memory systems · GitHub\n\n\n\n\nWe would love to hear how the community benchmarks or improves agent memory systems!",
"title": "MemEval & PropMem: Standardizing Agent Memory Benchmarks"
}