External Publication

Context Window

Sahil Kapoor's Playbook May 12, 2026

The context window is the maximum amount of text an LLM can process in a single request, measured in tokens. It includes the system prompt, the user message, any retrieved context, prior conversation history, and the model's own response. Anything beyond the window is truncated or excluded.

Typical sizes

4k to 16k tokens. Older models like GPT-3.5 and early Llama 2.
32k to 128k tokens. GPT-4 Turbo, Claude 2, Mistral models.
200k tokens. Claude 3 family.
1M+ tokens. Gemini 1.5 Pro, Claude 3.7, GPT-4.1 family.

Practical considerations

Tokenization. Token counts depend on the tokenizer; a rule of thumb is about 4 characters per token for English text.
Lost in the middle. Retrieval and reasoning quality often degrade for content placed in the middle of very long contexts.
Cost and latency. Most APIs charge per input and output token, and longer contexts increase request latency.

🔗

Related Terms RAG, Chunking

📖

Further Reading Token Economics: What Every Developer Needs to Understand Now

Typical sizes

Practical considerations

Discussion in the ATmosphere