Context Window
The maximum amount of text an LLM can read and reason over at once, measured in tokens.
January 15, 2026
The Short-Term Memory Analogy
An LLM's context window is like working memory for a person — it is the information the model can actively hold and reason over right now. Anything outside the window simply does not exist to the model. It cannot reference it, summarize it, or be influenced by it.
Context windows are measured in tokens. A 128k-token window can hold roughly 100,000 words — about the length of a novel. That sounds large, but long documents, codebases, and extended conversations can fill it quickly.
What Goes Into the Context Window
Every token the model reads counts toward the limit:
- Your system prompt and instructions
- The entire conversation history so far
- Documents or data you have pasted in
- Tool results returned from function calls
- The model's own previous responses
All of it competes for the same finite space.
What Happens When You Exceed the Limit
Different systems handle this differently. Some truncate older messages (the model "forgets" early conversation). Some throw an error. Some summarize earlier content. In all cases, exceeding the context window means information is lost — and the model may give inconsistent or confused answers as a result.
How RAG Helps
Rather than dumping an entire document library into the context window, RAG retrieves only the most relevant passages. This lets you work with knowledge bases far larger than any context window by being selective about what actually needs to be in context for each query.
Context window sizes have grown dramatically in recent years and will continue to grow — but efficient use of context remains a core skill for anyone building with AI.
See also