{
  "$type": "site.standard.document",
  "bskyPostRef": {
    "cid": "bafyreielhkvhzujacmgbp2xyycynje55apmpwpxyhdknvw2kqw4ndwk7ke",
    "uri": "at://did:plc:25rdn5elo5izoxrmtis34zuk/app.bsky.feed.post/3mope6j3rc7o2"
  },
  "coverImage": {
    "$type": "blob",
    "ref": {
      "$link": "bafkreidy7qb3aoisfrqnfmcudr5maexqwrnxfkgnkdmodrmopcc2mvhoxm"
    },
    "mimeType": "image/webp",
    "size": 68248
  },
  "path": "/dev48v/the-context-window-an-llms-short-term-memory-explained-c8c",
  "publishedAt": "2026-06-20T07:05:43.000Z",
  "site": "https://dev.to",
  "tags": [
    "ai",
    "llm",
    "beginners",
    "machinelearning",
    "https://dev48v.infy.uk/ai/days/day8-context-window.html",
    "Fill the box."
  ],
  "textContent": "A chatbot feels like it remembers you. It doesn't — it's stateless. Everything it \"knows\" is just text resent each call, up to a fixed limit: the context window. When the box fills, the oldest messages fall off the edge and are genuinely gone.\n\n🪟 **Watch tokens fall off:** https://dev48v.infy.uk/ai/days/day8-context-window.html\n\n##  The model is stateless\n\n\n    reply = model(allMessagesSoFar);  // the app resends the whole history every turn\n\n\n\"Memory\" is just text you keep pasting back in.\n\n##  The window is a hard token limit\n\nPrompt + conversation + pasted docs + the reply must all fit inside a fixed number of tokens. When the chat grows past it, the oldest messages get dropped — in the demo, faded messages have scrolled OUT and the model literally can't see them. Ask about something dropped and it truly has no idea.\n\n##  It's also the cost meter\n\nYou're billed per token in the window, every call. Pasting a whole book each turn is slow and expensive — so you don't just CAN'T fit unlimited text, you don't WANT to.\n\n##  \"Lost in the middle\"\n\nEven within the limit, models attend best to the START and END; facts buried in the middle of a huge context can be overlooked. Bigger isn't automatically better.\n\n##  Managing it is the skill\n\nSummarise old turns + keep recent ones verbatim + use RAG to fetch only the relevant chunks instead of pasting everything. Understanding the window explains chatbot \"amnesia\" and most prompt-engineering tactics.\n\nFill the box.",
  "title": "The Context Window: an LLM's Short-Term Memory, Explained"
}