Temperature
A parameter that controls how random or predictable a language model's output is.
What Is Temperature
Temperature is a parameter that controls the randomness of a large language model's output by adjusting how the model samples the next token from its predicted probability distribution. It is typically set as a number, often between 0 and 2, where lower values make output more predictable and higher values make it more varied.
How It Works
At each generation step, a model computes a probability for every possible next token. At a temperature of 0, the model effectively always picks the single most probable token, producing deterministic or near-deterministic output. As temperature increases, the probability distribution is flattened, giving lower-probability tokens a greater chance of being selected, which increases variety but also increases the chance of less coherent or less accurate output. Very high temperatures can lead to output that ignores instructions or drifts into unrelated text.
Choosing a Temperature
The right setting depends on the task. Low temperature values suit tasks that need consistency and precision, such as generating structured data, writing code, or answering factual questions. Higher temperature values suit tasks that benefit from variety, such as brainstorming, creative writing, or generating multiple distinct options for the same prompt. Some providers also expose related sampling parameters, such as top-p, also called nucleus sampling, which restricts sampling to the smallest set of tokens whose combined probability exceeds a threshold, and can be used together with or instead of temperature.
Temperature vs Other Sampling Controls
- Temperature: reshapes the overall probability distribution before sampling.
- Top-p: limits sampling to a subset of the most likely tokens.
- Top-k: limits sampling to a fixed number of the most likely tokens.
These parameters are usually configurable per request through a model provider's API, and an application or agent can set different values for different kinds of tasks, for instance a low temperature for a coding agent and a higher one for a writing assistant.
Practical Considerations
Temperature is easy to misuse if treated as a general quality dial rather than a randomness control. Setting it very low does not make a model more accurate about facts it does not know; it only makes the model more consistent in how it answers, including consistently wrong answers if the underlying knowledge is missing. Likewise, a high temperature can make a model appear more creative while also making its output less reliable for tasks such as generating structured data or following a strict output format, since a less probable token chosen mid-generation can derail the rest of the response. For that reason, many applications default to a low, near-zero temperature for anything that feeds into automated processing, and reserve higher values for tasks explicitly meant to produce varied, exploratory output.