External Publication
Visit Post

Context Window

Sahil Kapoor's Playbook May 12, 2026
Source

The context window is the maximum amount of text an LLM can process in a single request, measured in tokens. It includes the system prompt, the user message, any retrieved context, prior conversation history, and the model's own response. Anything beyond the window is truncated or excluded.

Typical sizes

  • 4k to 16k tokens. Older models like GPT-3.5 and early Llama 2.
  • 32k to 128k tokens. GPT-4 Turbo, Claude 2, Mistral models.
  • 200k tokens. Claude 3 family.
  • 1M+ tokens. Gemini 1.5 Pro, Claude 3.7, GPT-4.1 family.

Practical considerations

  • Tokenization. Token counts depend on the tokenizer; a rule of thumb is about 4 characters per token for English text.
  • Lost in the middle. Retrieval and reasoning quality often degrade for content placed in the middle of very long contexts.
  • Cost and latency. Most APIs charge per input and output token, and longer contexts increase request latency.

๐Ÿ”—

Related Terms RAG, Chunking

๐Ÿ“–

Further Reading Token Economics: What Every Developer Needs to Understand Now

Discussion in the ATmosphere

Loading comments...