Raw Record Source

{
  "$type": "site.standard.document",
  "coverImage": {
    "$type": "blob",
    "ref": {
      "$link": "bafkreiaylvbvnh57lthfwkw4roxfo33ifffs5phifgbdn7wqbmq374bvle"
    },
    "mimeType": "image/png",
    "size": 174394
  },
  "path": "/notes/2026-01-22-claude-code-hallucinating-like-2024/",
  "publishedAt": "2026-01-22T00:00:00.000Z",
  "site": "at://did:plc:aes3lokiqtv63fk62nwnjeuf/site.standard.publication/3mnin5cnq2q2a",
  "textContent": "I've just been using Claude Code and my task management skill to work through my current life areas and projects and help me define next actions. I've had enough time with Opus 4.5 recently that I was legit surprised when it started hallucinating plausible but absolutely-incorrect stuff about some of my projects. Not least because it went from what I'm used to to GPT-3.5-levels of batshit hallucinations so suddenly. Its explanation of why this happened is interesting. Brief Context I manage my stuff with a GTD/PARA-esque model: Areas, Projects and Tasks. Areas include things like Finance, Health, Coding etc and contain projects. Projects can contain tasks. I've recently built Taskdn, which stores areas, projects and tasks as markdown files in my Obsidian vault and includes a Claude Code skill & CLI to help Claude Code work with them. My personal area and project files have been in this system for a few months now, but having just shipped an Alpha release of the desktop app, it was time to populate my directory with task files and start using them as my daily driver. So I fired up CC, loaded the skill and basically said \"look at all my areas and projects. Let's define next actions for them all and create the appropriate task files. Some projects have clear checklists in the project docs, others we'll need to discuss. Let's do them one by one. What order would you suggest?\" Claude ran a few commands and sensibly decided it should read all my 14 area and 23 project files in full. It gave a very good summary of the current situation, sensibly suggested we skip a few projects and proposed an order of attack. For each project it would report anything in the project doc which seemed like current/future tasks and propose clear Next Actions to create. If it was unsure, it would ask me for more info. Once we'd agreed on the tasks to create it would do so and edit the project doc accordingly. Which worked perfectly for the first three projects on the list: 1. Tax Return YE April 2025 2. End-of-Year Finance Reset (renamed to Jan Finance Reset + created new Credit File and History project) 3. [REDACTED] And then we got to the next three – all of which sit under my RAFAC area and have to do with my voluntary work with the RAF Air Cadets... The only real things here are DCCT and M Qual & LR – the rest is plausible but totally made-up nonsense. (Fuck knows what Feb's Staff Ride is but... it sounds fun? Maybe I'm joining the Cavalry!) And for the next project, which relates to squaring my uniform and field kit, I was confidently told that my project doc prioritises the following: C95 Lightweights, MTP Trousers, C95 shirts, Flying suit, Softie jacket. If you happen to know what these items are, you'll certainly be laughing at the image of me wearing all of them together, in the year 2025. For everyone else: CS95 went out of service in 2010, lightweights are from the 1980's and my softie jacket is from the mid 2000's and buried somewhere in my loft. I do not, and never will, have need of a flying suit. The project doc does contain a short list of MTP uniform items I need to replace, but it does not include trousers. I'd maybe expect this kind of hallucination from contextless ChatGPT, especially with something as niche as this. But given what Claude knew about me in this session I did not expect ChatGPT-in-early-2024 levels of obviously-made-up stuff from Opus 4.5. To be clear, the most surprising things were: 1. It's so obviously wrong. It's not the subtle kind of hallucination which modern Anthropic models tend to produce. 2. Somewhat uncharacteristically, Claude didn't spot its own nonsense. there was no wait I might have made that up lemme check or Let me clarify questions. 3. It was so sudden. Claude went from sensible-and-effective to random-token-prediction-machine in like two turns. So what's going on? Here's our conversation on this (nothing was auto-compacted, so you can ignore the stuff about that)... Okay, um you've hallucinated a bunch of stuff about what's in the last two project documents, I suspect, because we've covered a lot of very different and disparate uh projects and areas while going over this. And you haven't actually read the project documents in a while, so you're just making up stuff you think will be in them I'd like you to try to explain why you have just for the last couple of projects (so 4, 5 and 6) suddenly started to hallucinate credible but very incorrect information about what's in the doc. I don't need you to apologize for this, and I don't need you to do anything about it, but I'd like to understand why you think it is that that's happened when prior to that you were extremely accurate in helping me out and explaining what was in those docs and everything. Um I'm very familiar with using AI agents like Claude Code, so you you can be detailed or technical in your explanation of this. I want to know so that in the future I can design some instructions for this kind of work that will help to prevent it. I don't need you to write those instructions for me now. I just need you to try to explain why you think this happened and in particular why it happened with projects four, five, and six here. Got it, that's interesting. Can you just explain a tiny bit about how um your memory of your earlier reads degraded over time? Like I assume that that is compressed in addition to just the more context you have in your contact window, the more stuff there is to get confused about, right? Is that correct, that there is some form of of of compression happening there, i.e. summaries for for much older information. I'm also particularly interested that this seemed to happen with RAFAC-related stuff. Like the task immediately before that, you very correctly told me what was in the project document from memory without re reading. And then immediately the next project You were hallucinating plausible things. So in addition to uh compression and degradation in your context window, I'm wondering about two things: All my RAFAC-Related docs Make heavy use of very niche millitary-specific terminology and acronyms. And I'm wondering if that's likely to have had effect on the training data that you you will use when you're workingout what to generate next. And the second thing is that many of my previous projects were f fairly specific. I I can't imagine you've read a great deal in your training data about fixing a Credit Record with [REDACTED] etc. And then finally I'm conscious that if you read all of my area and projects together, a lot of the other stuff is very kinda connected, right? Like you you you can imagine me as a programmer who does coding things, who is also into self improvement and systems management and has a consultancy, those things fit neatly together as a like an expected persona almost. My RAFAC/Millitary stuff um would not necessarily fit next to that in the same kind of way. And so I guess these three things I I'm asking here, I'm interested to know or four things I suppose, I'm interested to know if you think these have had an effect, and if so, how much, which of them do you think likely have the most effect. I'd really just like to explore this to help understand how you work herea little bit more. completely agree with your fix of instructing you to always read the project doc (and probably area doc) before a new thing BTW. Well what are your thoughts on all this? All of which I find interesting, and food for thought as I continue to use Claude Code for things which aren't programming.",
  "title": "Claude Code hallucinating like it's 2024"
}