Raw Record Source

{
  "$type": "site.standard.document",
  "bskyPostRef": {
    "cid": "bafyreihhj3x4ve2h774pfx2pxbxfbicce5ypyvqntcvl2bnnxqfo5k6xs4",
    "uri": "at://did:plc:pgryn3ephfd2xgft23qokfzt/app.bsky.feed.post/3mkwez6h2mjg2"
  },
  "path": "/t/how-do-you-design-memory-systems-for-long-running-ai-agents/175584#post_4",
  "publishedAt": "2026-05-03T03:35:03.000Z",
  "site": "https://discuss.huggingface.co",
  "textContent": "Hi Michael, happy to discuss it here.\n\nThe main design rule I use is that the model is not the memory system. The runtime is.\n\nFor question 1, I would split the decision into layers.\n\nThe application should always make the final decision about what is allowed to become durable memory. The LLM can help classify or propose memory candidates, but I would not let it directly write permanent state without rules around it.\n\nA practical flow is:\n\n  1. Capture events from the run.\n\n  2. Extract candidate memories or state changes.\n\n  3. Classify them by type.\n\n  4. Apply policy rules.\n\n  5. Store only what is useful, verified, or needed later.\n\n  6. Keep uncertain items marked as uncertain rather than treating them as facts.\n\n\n\n\nFor question 2, persistence does not need to start fancy.\n\nA relational database is enough for many systems. SQLite is fine for local prototypes. Postgres is a good default once the system becomes serious. You can add vector search later for retrieval, but I would not make vector storage the whole memory system.\n\nI usually think of storage as several categories:\n\n  1. Event log.\n\n  2. Current task state.\n\n  3. Durable project state.\n\n  4. User or operator preferences.\n\n  5. Artifacts and files.\n\n  6. Searchable summaries.\n\n  7. Embeddings for retrieval when useful.\n\n\n\n\nSerialization can be simple JSON at first, but the important thing is to use typed records. Each record should say what it is, where it came from, when it was written, what confidence it has, and whether it is still active.\n\nFor question 3, the agent should not retrieve everything. It should retrieve based on the current objective.\n\nUseful criteria include:\n\n  1. Is this needed for the current task?\n\n  2. Was it created by this project, user, or run?\n\n  3. Is it recent enough to matter?\n\n  4. Is it still marked active?\n\n  5. Is it verified or only a guess?\n\n  6. Does it conflict with newer information?\n\n  7. Is it instruction, preference, state, history, or evidence?\n\n\n\n\nThe pattern that worked best for me is:\n\nPersist broadly, retrieve narrowly.\n\nStore enough that the system can recover, audit, and continue later. But before each model call, build a small active context from only the pieces needed for the next step.\n\nThat is where long running agents become much more manageable. You stop treating the prompt as the memory container, and start treating the prompt as a temporary working view over external state.",
  "title": "How do you design memory systems for long-running AI agents?"
}