GLM-5.1
GLM-5.1 is a GLM model from Z.ai, released in 2026-04. It costs $1 / $3.2 per 1M, has a 200k-token context window, and is best for coding, cheap-frontier, open-weights. Last verified 2026-04-19.
Spec sheet
Pricing
- Input
- $1 / 1M
- Output
- $3.2 / 1M
- Free tier
- bigmodel.cn
Context & speed
- Context window
- 200k tokens
- Max output
- 131k tokens
- Throughput
- ~95 tok/s
- Time to first token
- ~700 ms
- Speed tier
- balanced
Capabilities
- Tool use
- Yes
- Structured output
- Yes
- Prompt caching
- No
- Extended thinking
- Yes
- Vision input
- No
- Audio in / out
- No
- Fine-tuning
- Yes
Deployment
- Open weights
- Yes
- On-prem
- Yes
- HIPAA eligible
- No
- Zero retention
- No
- Regions
- apac, us
Estimated monthly cost
Assumes typical token shape: 2k input, 600 output per call. Prompt caching is excluded from these figures.
When to use GLM-5.1
Sweet spot
- coding
- cheap frontier
- open weights
- multilingual
Known trade-offs
- data-routing via China for hosted API
- newer SDK ecosystem
Works with
Compare GLM-5.1 to other models
FAQ — GLM-5.1
How much does GLM-5.1 cost?
GLM-5.1 costs $1 / $3.2 per 1M tokens on the Z.ai API. This model does not currently support prompt caching, so list price is the full cost.
What is the context window of GLM-5.1?
GLM-5.1 has a 200k-token context window with up to 131k tokens of output. That's enough for long reports, extended chat histories, or structured document analysis.
Does GLM-5.1 have a free tier?
Yes — Free tier with monthly token allowance. Start at https://bigmodel.cn.
Is GLM-5.1 HIPAA / EU / on-prem friendly?
GLM-5.1 is not HIPAA-eligible, not available in an EU region, and offers open weights for self-hosting. Zero data retention is not available.
What is GLM-5.1 best for?
GLM-5.1 is best for coding, cheap frontier, open weights, multilingual. Trade-offs to be aware of: data-routing via China for hosted API; newer SDK ecosystem.
Which tools and SDKs work with GLM-5.1?
GLM-5.1 integrates with Z.ai SDK, OpenAI-compatible API, OpenRouter, Ollama, vLLM, LangChain. Most major AI frameworks support it either natively or through OpenAI-compatible endpoints.