OpenAI

GPT Realtime

GPT Realtime is a GPT Realtime model from OpenAI, released in 2025-08. It costs $4 / $16 per 1M, has a 128k-token context window, and is best for voice, phone-agents, interactive-voice. Last verified 2026-04-19.

Spec sheet

Pricing

Input
$4 / 1M
Output
$16 / 1M
Free tier
No

Context & speed

Context window
128k tokens
Max output
4k tokens
Throughput
~130 tok/s
Time to first token
~180 ms
Speed tier
ultra

Capabilities

Tool use
Yes
Structured output
No
Prompt caching
No
Extended thinking
No
Vision input
No
Audio in / out
Yes
Fine-tuning
No

Deployment

Open weights
No
On-prem
No
HIPAA eligible
No
Zero retention
No
Regions
us, eu

Estimated monthly cost

Assumes typical token shape: 2k input, 600 output per call. Prompt caching is excluded from these figures.

10k calls/mo
$176.00
per month
100k calls/mo
$1.8k
per month
1M calls/mo
$18k
per month

When to use GPT Realtime

Sweet spot

  • voice
  • phone agents
  • interactive voice

Known trade-offs

  • audio only use case
  • costly for long calls

Works with

OpenAI SDK (Realtime API)Azure OpenAIWebRTCLiveKit

FAQ — GPT Realtime

How much does GPT Realtime cost?

GPT Realtime costs $4 / $16 per 1M tokens on the OpenAI API. This model does not currently support prompt caching, so list price is the full cost.

What is the context window of GPT Realtime?

GPT Realtime has a 128k-token context window with up to 4k tokens of output. That's enough for typical chat and short-document tasks.

Does GPT Realtime have a free tier?

No — Paid-only.

Is GPT Realtime HIPAA / EU / on-prem friendly?

GPT Realtime is not HIPAA-eligible, available in EU regions, and is API-only. Zero data retention is not available.

What is GPT Realtime best for?

GPT Realtime is best for voice, phone agents, interactive voice. Trade-offs to be aware of: audio only use case; costly for long calls.

Which tools and SDKs work with GPT Realtime?

GPT Realtime integrates with OpenAI SDK (Realtime API), Azure OpenAI, WebRTC, LiveKit. Most major AI frameworks support it either natively or through OpenAI-compatible endpoints.