Google Free tier

Gemini 3.1 Flash-Lite

Gemini 3.1 Flash-Lite is a Gemini 3 model from Google, released in 2026-04. It costs $0.25 / $1.5 per 1M, has a 1M-token context window, and is best for classification, extraction, ultra-cheap. Last verified 2026-04-19.

Spec sheet

Pricing

Input
$0.25 / 1M
Output
$1.5 / 1M
Batch discount
50%
Free tier
Google AI Studio

Context & speed

Context window
1M tokens
Max output
8k tokens
Throughput
~250 tok/s
Time to first token
~200 ms
Speed tier
ultra

Capabilities

Tool use
Yes
Structured output
Yes
Prompt caching
No
Extended thinking
No
Vision input
Yes
Audio in / out
No
Fine-tuning
No

Deployment

Open weights
No
On-prem
No
HIPAA eligible
No
Zero retention
No
Regions
us, eu, apac

Estimated monthly cost

Assumes typical token shape: 2k input, 600 output per call. Prompt caching is excluded from these figures.

10k calls/mo
$14.00
per month
100k calls/mo
$140.00
per month
1M calls/mo
$1.4k
per month

When to use Gemini 3.1 Flash-Lite

Sweet spot

  • classification
  • extraction
  • ultra cheap
  • high throughput

Known trade-offs

  • weak reasoning

Works with

Google AI SDKVertex AIVercel AI SDKLangChainOpenRouter

FAQ — Gemini 3.1 Flash-Lite

How much does Gemini 3.1 Flash-Lite cost?

Gemini 3.1 Flash-Lite costs $0.25 / $1.5 per 1M tokens on the Google API. This model does not currently support prompt caching, so list price is the full cost. The batch API offers a 50% discount for async workloads.

What is the context window of Gemini 3.1 Flash-Lite?

Gemini 3.1 Flash-Lite has a 1M-token context window with up to 8k tokens of output. That's enough for entire codebases, long transcripts, or multi-document RAG.

Does Gemini 3.1 Flash-Lite have a free tier?

Yes — Reduced daily quota; most generous free tier of any frontier lab. Start at https://aistudio.google.com.

Is Gemini 3.1 Flash-Lite HIPAA / EU / on-prem friendly?

Gemini 3.1 Flash-Lite is not HIPAA-eligible, available in EU regions, and is API-only. Zero data retention is not available.

What is Gemini 3.1 Flash-Lite best for?

Gemini 3.1 Flash-Lite is best for classification, extraction, ultra cheap, high throughput. Trade-offs to be aware of: weak reasoning.

Which tools and SDKs work with Gemini 3.1 Flash-Lite?

Gemini 3.1 Flash-Lite integrates with Google AI SDK, Vertex AI, Vercel AI SDK, LangChain, OpenRouter. Most major AI frameworks support it either natively or through OpenAI-compatible endpoints.