o4-mini
o4-mini is a OpenAI o-series model from OpenAI, released in 2025-04. It costs $1.1 / $4.4 per 1M, has a 200k-token context window, and is best for cheap-reasoning, math, research. Last verified 2026-04-19.
Spec sheet
Pricing
- Input
- $1.1 / 1M
- Output
- $4.4 / 1M
- Cached input
- $0.275 / 1M
- Batch discount
- 50%
- Free tier
- No
Context & speed
- Context window
- 200k tokens
- Max output
- 100k tokens
- Throughput
- ~60 tok/s
- Time to first token
- ~1500 ms
- Speed tier
- slow
Capabilities
- Tool use
- Yes
- Structured output
- Yes
- Prompt caching
- Yes
- Extended thinking
- Yes
- Vision input
- Yes
- Audio in / out
- No
- Fine-tuning
- No
Deployment
- Open weights
- No
- On-prem
- No
- HIPAA eligible
- Yes
- Zero retention
- Yes
- Regions
- us, eu
Estimated monthly cost
Assumes typical token shape: 2k input, 600 output per call. Prompt caching is excluded from these figures.
When to use o4-mini
Sweet spot
- cheap reasoning
- math
- research
- hardest coding
Known trade-offs
- slow first-token
- no streaming-first use cases
Works with
Compare o4-mini to other models
FAQ — o4-mini
How much does o4-mini cost?
o4-mini costs $1.1 / $4.4 per 1M tokens on the OpenAI API. Cached input reads cost $0.275 per 1M, cutting the input bill by roughly 75% on repeat system prompts. The batch API offers a 50% discount for async workloads.
What is the context window of o4-mini?
o4-mini has a 200k-token context window with up to 100k tokens of output. That's enough for long reports, extended chat histories, or structured document analysis.
Does o4-mini have a free tier?
No — Paid-only.
Is o4-mini HIPAA / EU / on-prem friendly?
o4-mini is HIPAA-eligible, available in EU regions, and is API-only. Zero data retention is available for enterprise customers.
What is o4-mini best for?
o4-mini is best for cheap reasoning, math, research, hardest coding. Trade-offs to be aware of: slow first-token; no streaming-first use cases.
Which tools and SDKs work with o4-mini?
o4-mini integrates with OpenAI SDK, Azure OpenAI, Vercel AI SDK, LangChain, OpenRouter. Most major AI frameworks support it either natively or through OpenAI-compatible endpoints.