GPT Realtime
GPT Realtime is a GPT Realtime model from OpenAI, released in 2025-08. It costs $4 / $16 per 1M, has a 128k-token context window, and is best for voice, phone-agents, interactive-voice. Last verified 2026-04-19.
Spec sheet
Pricing
- Input
- $4 / 1M
- Output
- $16 / 1M
- Free tier
- No
Context & speed
- Context window
- 128k tokens
- Max output
- 4k tokens
- Throughput
- ~130 tok/s
- Time to first token
- ~180 ms
- Speed tier
- ultra
Capabilities
- Tool use
- Yes
- Structured output
- No
- Prompt caching
- No
- Extended thinking
- No
- Vision input
- No
- Audio in / out
- Yes
- Fine-tuning
- No
Deployment
- Open weights
- No
- On-prem
- No
- HIPAA eligible
- No
- Zero retention
- No
- Regions
- us, eu
Estimated monthly cost
Assumes typical token shape: 2k input, 600 output per call. Prompt caching is excluded from these figures.
When to use GPT Realtime
Sweet spot
- voice
- phone agents
- interactive voice
Known trade-offs
- audio only use case
- costly for long calls
Works with
Compare GPT Realtime to other models
FAQ — GPT Realtime
How much does GPT Realtime cost?
GPT Realtime costs $4 / $16 per 1M tokens on the OpenAI API. This model does not currently support prompt caching, so list price is the full cost.
What is the context window of GPT Realtime?
GPT Realtime has a 128k-token context window with up to 4k tokens of output. That's enough for typical chat and short-document tasks.
Does GPT Realtime have a free tier?
No — Paid-only.
Is GPT Realtime HIPAA / EU / on-prem friendly?
GPT Realtime is not HIPAA-eligible, available in EU regions, and is API-only. Zero data retention is not available.
What is GPT Realtime best for?
GPT Realtime is best for voice, phone agents, interactive voice. Trade-offs to be aware of: audio only use case; costly for long calls.
Which tools and SDKs work with GPT Realtime?
GPT Realtime integrates with OpenAI SDK (Realtime API), Azure OpenAI, WebRTC, LiveKit. Most major AI frameworks support it either natively or through OpenAI-compatible endpoints.