← All terms

Context Window

The maximum amount of text a language model can process at once, measured in tokens.

What Is a Context Window

The context window is the maximum amount of text, measured in tokens, that a large language model can process at one time as input and output combined. It includes the system prompt, the conversation history, any retrieved documents or tool outputs, and the response the model generates. Once the total exceeds this limit, older content has to be truncated, summarized, or otherwise removed for the model to continue operating.

How It Works

Internally, a transformer-based model computes attention across every token pair within its input, so the context window is tied directly to the model's architecture and the computational cost of a request: larger windows generally require more memory and processing time. Context window sizes vary widely by model and provider, and typically range from a few thousand to over a million tokens. A model does not remember anything outside its context window between requests unless the surrounding application resends relevant history or retrieves it from storage.

Why It Matters

Context window size directly limits what a model can reason about in a single call. A short window forces an application to trim conversation history or documents, risking loss of relevant detail. A long window allows more source material and history to be included, but does not guarantee the model will use all of it equally well, since attention to information in the middle of a very long input can be weaker than attention to content near the beginning or end. This is sometimes called the lost in the middle effect.

Managing the Context Window

  • Summarization: compressing older conversation turns into shorter summaries to free up space.
  • Retrieval: fetching only the most relevant documents or facts for the current request instead of including everything.
  • Truncation: dropping the oldest or least relevant messages when a limit is reached.

Long-lived agents that run for extended periods, such as those managed by an agent orchestration platform, need explicit strategies for context window management, since a task that spans many tool calls and outputs can otherwise exceed the limit well before the task is complete.

Get started

Deploy your fleet.

Put a fleet of sandboxed agents to work on your own infrastructure, provisioned in seconds and watched live from one console.

Get started

Admin-provisioned · Self-host in one command · Your data never leaves your VM