External Publication

How do you design memory systems for long-running AI agents?

Hugging Face Forums [Unofficial] May 3, 2026

Hi Michael, happy to discuss it here. The main design rule I use is that the model is not the memory system. The runtime is. For question 1, I would split the decision into layers. The application should always make the final decision about what is allowed to become durable memory. The LLM can help classify or propose memory candidates, but I would not let it directly write permanent state without rules around it. A practical flow is: 1. Capture events from the run. 2. Extract candidate memories or state changes. 3. Classify them by type. 4. Apply policy rules. 5. Store only what is useful, verified, or needed later. 6. Keep uncertain items marked as uncertain rather than treating them as facts. For question 2, persistence does not need to start fancy. A relational database is enough for many systems. SQLite is fine for local prototypes. Postgres is a good default once the system becomes serious. You can add vector search later for retrieval, but I would not make vector storage the whole memory system. I usually think of storage as several categories: 1. Event log. 2. Current task state. 3. Durable project state. 4. User or operator preferences. 5. Artifacts and files. 6. Searchable summaries. 7. Embeddings for retrieval when useful. Serialization can be simple JSON at first, but the important thing is to use typed records. Each record should say what it is, where it came from, when it was written, what confidence it has, and whether it is still active. For question 3, the agent should not retrieve everything. It should retrieve based on the current objective. Useful criteria include: 1. Is this needed for the current task? 2. Was it created by this project, user, or run? 3. Is it recent enough to matter? 4. Is it still marked active? 5. Is it verified or only a guess? 6. Does it conflict with newer information? 7. Is it instruction, preference, state, history, or evidence? The pattern that worked best for me is: Persist broadly, retrieve narrowly. Store enough that the system can recover, audit, and continue later. But before each model call, build a small active context from only the pieces needed for the next step. That is where long running agents become much more manageable. You stop treating the prompt as the memory container, and start treating the prompt as a temporary working view over external state.

Discussion in the ATmosphere