Gemini 3.5 Flash
Gemini 3.5 Flash is a Gemini 3 model from Google, released in 2026-05. It costs $1.5 / $9 per 1M, has a 1.0M-token context window, and is best for agents, coding, long-context. Last verified 2026-05-22.
Spec sheet
Pricing
- Input
- $1.5 / 1M
- Output
- $9 / 1M
- Cached input
- $0.15 / 1M
- Batch discount
- 50%
- Free tier
- Google AI Studio
Context & speed
- Context window
- 1.0M tokens
- Max output
- 66k tokens
- Throughput
- ~220 tok/s
- Time to first token
- ~300 ms
- Speed tier
- fast
Capabilities
- Tool use
- Yes
- Structured output
- Yes
- Prompt caching
- Yes
- Extended thinking
- Yes
- Vision input
- Yes
- Audio in / out
- Yes
- Fine-tuning
- Yes
Deployment
- Open weights
- No
- On-prem
- No
- HIPAA eligible
- Yes
- Zero retention
- Yes
- Regions
- us, eu, apac
Estimated monthly cost
Assumes typical token shape: 2k input, 600 output per call. Prompt caching is excluded from these figures.
When to use Gemini 3.5 Flash
Sweet spot
- agents
- coding
- long context
- multimodal
- general production
Known trade-offs
- 3x more expensive than Gemini 3 Flash
- non-global regions priced 10% higher at $1.65/$9.90
Works with
Compare Gemini 3.5 Flash to other models
FAQ — Gemini 3.5 Flash
How much does Gemini 3.5 Flash cost?
Gemini 3.5 Flash costs $1.5 / $9 per 1M tokens on the Google API. Cached input reads cost $0.15 per 1M, cutting the input bill by roughly 90% on repeat system prompts. The batch API offers a 50% discount for async workloads.
What is the context window of Gemini 3.5 Flash?
Gemini 3.5 Flash has a 1.0M-token context window with up to 66k tokens of output. That's enough for entire codebases, long transcripts, or multi-document RAG.
Does Gemini 3.5 Flash have a free tier?
Yes — Reduced daily quota via AI Studio for prototyping. Start at https://aistudio.google.com.
Is Gemini 3.5 Flash HIPAA / EU / on-prem friendly?
Gemini 3.5 Flash is HIPAA-eligible, available in EU regions, and is API-only. Zero data retention is available for enterprise customers.
What is Gemini 3.5 Flash best for?
Gemini 3.5 Flash is best for agents, coding, long context, multimodal. Trade-offs to be aware of: 3x more expensive than Gemini 3 Flash; non-global regions priced 10% higher at $1.65/$9.90.
Which tools and SDKs work with Gemini 3.5 Flash?
Gemini 3.5 Flash integrates with Google AI SDK, Vertex AI, GitHub Copilot, Vercel AI SDK, LangChain, LlamaIndex, OpenRouter. Most major AI frameworks support it either natively or through OpenAI-compatible endpoints.