Gemini 3.1 Flash-Lite
Gemini 3.1 Flash-Lite is a Gemini 3 model from Google, released in 2026-04. It costs $0.25 / $1.5 per 1M, has a 1M-token context window, and is best for classification, extraction, ultra-cheap. Last verified 2026-04-19.
Spec sheet
Pricing
- Input
- $0.25 / 1M
- Output
- $1.5 / 1M
- Batch discount
- 50%
- Free tier
- Google AI Studio
Context & speed
- Context window
- 1M tokens
- Max output
- 8k tokens
- Throughput
- ~250 tok/s
- Time to first token
- ~200 ms
- Speed tier
- ultra
Capabilities
- Tool use
- Yes
- Structured output
- Yes
- Prompt caching
- No
- Extended thinking
- No
- Vision input
- Yes
- Audio in / out
- No
- Fine-tuning
- No
Deployment
- Open weights
- No
- On-prem
- No
- HIPAA eligible
- No
- Zero retention
- No
- Regions
- us, eu, apac
Estimated monthly cost
Assumes typical token shape: 2k input, 600 output per call. Prompt caching is excluded from these figures.
When to use Gemini 3.1 Flash-Lite
Sweet spot
- classification
- extraction
- ultra cheap
- high throughput
Known trade-offs
- weak reasoning
Works with
Compare Gemini 3.1 Flash-Lite to other models
FAQ — Gemini 3.1 Flash-Lite
How much does Gemini 3.1 Flash-Lite cost?
Gemini 3.1 Flash-Lite costs $0.25 / $1.5 per 1M tokens on the Google API. This model does not currently support prompt caching, so list price is the full cost. The batch API offers a 50% discount for async workloads.
What is the context window of Gemini 3.1 Flash-Lite?
Gemini 3.1 Flash-Lite has a 1M-token context window with up to 8k tokens of output. That's enough for entire codebases, long transcripts, or multi-document RAG.
Does Gemini 3.1 Flash-Lite have a free tier?
Yes — Reduced daily quota; most generous free tier of any frontier lab. Start at https://aistudio.google.com.
Is Gemini 3.1 Flash-Lite HIPAA / EU / on-prem friendly?
Gemini 3.1 Flash-Lite is not HIPAA-eligible, available in EU regions, and is API-only. Zero data retention is not available.
What is Gemini 3.1 Flash-Lite best for?
Gemini 3.1 Flash-Lite is best for classification, extraction, ultra cheap, high throughput. Trade-offs to be aware of: weak reasoning.
Which tools and SDKs work with Gemini 3.1 Flash-Lite?
Gemini 3.1 Flash-Lite integrates with Google AI SDK, Vertex AI, Vercel AI SDK, LangChain, OpenRouter. Most major AI frameworks support it either natively or through OpenAI-compatible endpoints.