Raw Record Source

{
  "$type": "site.standard.document",
  "bskyPostRef": {
    "cid": "bafyreif4qpfyitxyzxu4oz43ky4zzg7u7ghjkrhpckg2yuc5cqs7lppe44",
    "uri": "at://did:plc:pgryn3ephfd2xgft23qokfzt/app.bsky.feed.post/3mgswwswagnv2"
  },
  "path": "/t/memeval-propmem-standardizing-agent-memory-benchmarks/174203#post_1",
  "publishedAt": "2026-03-11T21:35:05.000Z",
  "site": "https://discuss.huggingface.co",
  "tags": [
    "https://medium.com/prosus-ai-tech-blog/memeval-benchmarking-memory-for-ai-agents-932d3fd9f3b4",
    "GitHub - ProsusAI/MemEval: Benchmark suite for evaluating agent and LLM memory systems · GitHub"
  ],
  "textContent": "Reproducing results across AI agent memory systems is hard, different LLMs, embeddings, token budgets, and scoring methods make comparisons almost meaningless.\n\nWe built MemEval, an open-source benchmark that evaluates memory systems under standardized conditions and tracks token efficiency. While benchmarking, we discovered recurring failure modes, which led to PropMem, a factual memory system designed to address them efficiently.\n\nBoth projects are Open Source: ready for evaluation, extension, or collaboration.\n\nTry it out:\n\n  * Blog: https://medium.com/prosus-ai-tech-blog/memeval-benchmarking-memory-for-ai-agents-932d3fd9f3b4\n\n  * Code: GitHub - ProsusAI/MemEval: Benchmark suite for evaluating agent and LLM memory systems · GitHub\n\n\n\n\nWe would love to hear how the community benchmarks or improves agent memory systems!",
  "title": "MemEval & PropMem: Standardizing Agent Memory Benchmarks"
}