{
"$type": "site.standard.document",
"bskyPostRef": {
"cid": "bafyreih5t7fxvoxc65tsthsyvfltehkfgom6j36ghaz3frv5gjidrog5aq",
"uri": "at://did:plc:pgryn3ephfd2xgft23qokfzt/app.bsky.feed.post/3mgswxeh65xe2"
},
"path": "/t/a-simple-idea-separating-a-thinker-and-observer-model-to-detect-reasoning-loops/174134#post_7",
"publishedAt": "2026-03-11T19:11:22.000Z",
"site": "https://discuss.huggingface.co",
"tags": [
"https://paulolden1-432-a-journey-experience.hf.space/"
],
"textContent": "Hi, I think I’m on topic if I explain my little experiment running on a Hugging Face space. I used a cascade of three small models to bring the characters from my novel (also published here as a public dataset) to life. Essentially, the system I created simulates three characters from the novel with whom users can chat. In my system, the model inhabits a reality limited to the text provided, but generates increasingly better responses through continuous self-observation.\nThe code is very simple: in addition to the main dataset, there’s an additional dataset that stores user questions and, more importantly, the AI system’s continuous “reflections,” based on rereading the database and reprocessing user questions.\nThis data is generated during idle time: every 10 minutes, if there are fewer than 5 users connected, the code instructs the model to reread, reflect, reprocess the data, and perform self-prompting to refine subsequent responses.\nThis mechanism is similar to what we humans do: when we give an answer, we then reflect on it, sleep on it (dreaming), reconsider it… The next time we’re asked the exact same question, we’ll have greater awareness and respond better. During our quiet, sleepy time, we also rework the context of the reality we live in, correlating it with the questions and answers we encounter in our lives. We grow and improve also, and above all, by reflecting, dreaming, and reworking data. In my little experiment, I tried to simulate this process, and I must say it seems to work very well!\nThe hallucinations were drastically reduced after just a few days of use, and the characters’ coherence improved significantly.\nThis little experiment works with small, free models and very limited inference (it costs about $2 per month for inference). I spoke with Claude Opus 4.6 about this, and he confirmed that a system like this, which uses self-reflection and continuous self-training in idle time, isn’t a very popular field of research and that with large models and big budgets, it could yield truly interesting results.\nIt was also funny to hear him say that he “would be thrilled to be able to live, reflect, and think, even outside the prompting window”! :))\n\nFeel free to try it here: https://paulolden1-432-a-journey-experience.hf.space/",
"title": "A simple idea: separating a \"Thinker\" and \"Observer\" model to detect reasoning loops"
}