Question 1

What is a context window?

Accepted Answer

A context window is the maximum number of tokens (roughly characters or word-pieces) a language model can read in a single request. Inputs that exceed it get truncated or rejected.

Question 2

Tokens vs words — what's the difference?

Accepted Answer

A token is the unit each model actually reads. English averages around 0.75 tokens per word, so 1,000 words is typically 1,200–1,400 tokens — but this varies by tokenizer. Code and non-English text use more tokens per character.

Question 3

Why does Claude count tokens differently from GPT?

Accepted Answer

Each provider trains its own tokenizer with its own vocabulary. The same string can produce different token counts on different models. We use OpenAI's exact tokenizer for OpenAI models and calibrated character-ratio estimates for the rest.

Question 4

Should I leave room for the response?

Accepted Answer

Yes. The model writes its reply into the same context window. Reserve at least the maximum tokens you want it to generate (4k is a reasonable default for chat; 16k+ for long structured output).

Model	Provider	Window (tokens)	Max output
Gemini 3.1 Pro	google	2,000,000	65,536
GPT-5.5	openai	1,050,000	128,000
Claude Opus 4.7	anthropic	1,000,000	64,000
Claude Sonnet 4.6	anthropic	1,000,000	64,000
Gemini 3 Flash	google	1,000,000	65,536
Gemini 3.1 Flash-Lite	google	1,000,000	8,192
DeepSeek V4 Flash	deepseek	1,000,000	384,000
GPT-5.4	openai	400,000	32,000
GPT-5.4 Pro	openai	400,000	64,000
GPT-5.4 Mini	openai	400,000	16,384
Qwen3-Max	alibaba	262,144	32,768
Grok 4.3	xai	256,000	32,000
Codestral 25.08	mistral	256,000	8,192
Claude Haiku 4.5	anthropic	200,000	8,192
GLM-5.1	other	200,000	131,072
GPT Realtime	openai	128,000	4,096

Will your prompt fit? Check context windows across 20+ frontier LLMs.

Frequently asked

All frontier model context windows