← All terms

Token

The basic unit of text, such as a word piece, that a language model reads and generates.

What Is a Token

A token is the basic unit of text that a large language model reads and generates. Depending on the tokenizer used, a token can be a whole word, part of a word, a punctuation mark, or a single character; in English text, a common rule of thumb is that one token corresponds to roughly four characters or about three quarters of a word.

How Tokenization Works

Before any text reaches a model, it passes through a tokenizer, which converts the raw string into a sequence of integer IDs drawn from a fixed vocabulary. Most modern tokenizers use subword algorithms such as byte pair encoding, which break uncommon words into smaller, more frequent pieces so the model can represent virtually any input, including code, numbers, and text in multiple languages, using a manageable vocabulary size. The model then predicts output one token at a time, converting its predictions back into text through the same tokenizer.

Why Tokens Matter

Tokens are the unit by which a model's context window is measured, and the unit most API providers use for pricing. The number of tokens in a request depends on the tokenizer and the language of the text; for example, languages with less representation in the tokenizer's training data, or dense technical formats such as code, can use more tokens per character than plain English. Understanding token counts is therefore important for estimating cost, staying within a context window, and predicting how much text a model can process or produce in a single call.

Tokens and Cost

Because pricing is usually quoted per thousand or per million tokens, and often differs between input and output tokens, the same request can cost noticeably more or less depending on how verbose the prompt and the response are. Applications that call a model repeatedly, such as an agent that makes many tool calls within a single task, accumulate token usage across every one of those calls, not just the final visible response. Estimating token counts ahead of time, using the same tokenizer the target model uses, helps predict both cost and whether a given input will fit inside the model's context window before the request is ever sent.

Related Concepts

  • Context window: the maximum number of tokens a model can handle in one request.
  • Token budget: a limit an application sets on how many tokens a task or agent may consume.
  • Vocabulary: the fixed set of possible tokens a given tokenizer can produce.
Get started

Deploy your fleet.

Put a fleet of sandboxed agents to work on your own infrastructure, provisioned in seconds and watched live from one console.

Get started

Admin-provisioned · Self-host in one command · Your data never leaves your VM