Glossary

Context Window

The maximum amount of text an LLM can read and reason over at once, measured in tokens.

January 15, 2026

The Short-Term Memory Analogy

An LLM's context window is like working memory for a person — it is the information the model can actively hold and reason over right now. Anything outside the window simply does not exist to the model. It cannot reference it, summarize it, or be influenced by it.

Context windows are measured in tokens. A 128k-token window can hold roughly 100,000 words — about the length of a novel. That sounds large, but long documents, codebases, and extended conversations can fill it quickly.

What Goes Into the Context Window

Every token the model reads counts toward the limit:

Your system prompt and instructions
The entire conversation history so far
Documents or data you have pasted in
Tool results returned from function calls
The model's own previous responses

All of it competes for the same finite space.

What Happens When You Exceed the Limit

Different systems handle this differently. Some truncate older messages (the model "forgets" early conversation). Some throw an error. Some summarize earlier content. In all cases, exceeding the context window means information is lost — and the model may give inconsistent or confused answers as a result.

How RAG Helps

Rather than dumping an entire document library into the context window, RAG retrieves only the most relevant passages. This lets you work with knowledge bases far larger than any context window by being selective about what actually needs to be in context for each query.

Context window sizes have grown dramatically in recent years and will continue to grow — but efficient use of context remains a core skill for anyone building with AI.