Simplifying Thomas’ Memory: Working vs Archival

Brady Hawkins October 14, 2025
Source

I’ve been hard at work getting this project up and running. In the end, I decided to start with a much simpler memory architecture: working memory and archival memory.

The Context Window Problem

Conversations (which I’ll now call sessions) can get long. Right now, there’s a loop that feeds data into the model and outputs a response. Working memory stores all prior replies and session details. If a session runs long, the prompts can balloon in size.

I’m currently testing with Gemma2:2B, which has an 8192-token context window. Tokens aside, that’s roughly 30,000 characters. Long term, I’ll either need a model with a larger context window or an algorithm to sort and prioritize what’s truly relevant.

That’s a later problem. First, I just want to get Thomas running reliably on the network. Memory optimizations can come after.

Archival Memory in Postgres

The next feature is archival memory. Each session will be saved in my Postgres database.

Postgres makes this pretty simple:

That’s ludicrous. Did some one say 1 billion?

So storage isn’t the problem. I’ll save each session in a single row. The context window will be the real limiter, not Postgres.

What to Retrieve?

The real challenge is retrieval. What’s relevant to the current session?

For now, I think the simplest path is to pull in personas and prior sessions when needed. That should be enough for an MVP. Down the road, I’ll work on a smarter retrieval algorithm—something that balances relevance with efficiency.

That’s where I’ll focus later this week.

Discussion in the ATmosphere

Loading comments...