DeepSeek Free tierOpen weights

DeepSeek V4 Flash

DeepSeek V4 Flash is a DeepSeek model from DeepSeek, released in 2026-04. It costs $0.14 / $0.28 per 1M, has a 1M-token context window, and is best for cheap-reasoning, cheap-production, open-weights. Last verified 2026-05-06.

Spec sheet

Pricing

Input
$0.14 / 1M
Output
$0.28 / 1M
Cached input
$0.0028 / 1M
Free tier
OpenRouter

Context & speed

Context window
1M tokens
Max output
384k tokens
Throughput
~95 tok/s
Time to first token
~650 ms
Speed tier
balanced

Capabilities

Tool use
Yes
Structured output
Yes
Prompt caching
Yes
Extended thinking
Yes
Vision input
No
Audio in / out
No
Fine-tuning
Yes

Deployment

Open weights
Yes
On-prem
Yes
HIPAA eligible
No
Zero retention
No
Regions
apac, us

Estimated monthly cost

Assumes typical token shape: 2k input, 600 output per call. Prompt caching is excluded from these figures.

10k calls/mo
$4.48
per month
100k calls/mo
$44.80
per month
1M calls/mo
$448.00
per month

When to use DeepSeek V4 Flash

Sweet spot

  • cheap reasoning
  • cheap production
  • open weights
  • long context cheap

Known trade-offs

  • data-routing via China for hosted API
  • below top-20 on arena leaderboard

Works with

DeepSeek SDKOpenAI-compatible APIOpenRouterOllamavLLMLangChain

FAQ — DeepSeek V4 Flash

How much does DeepSeek V4 Flash cost?

DeepSeek V4 Flash costs $0.14 / $0.28 per 1M tokens on the DeepSeek API. Cached input reads cost $0.0028 per 1M, cutting the input bill by roughly 98% on repeat system prompts.

What is the context window of DeepSeek V4 Flash?

DeepSeek V4 Flash has a 1M-token context window with up to 384k tokens of output. That's enough for entire codebases, long transcripts, or multi-document RAG.

Does DeepSeek V4 Flash have a free tier?

Yes — Often available free via OpenRouter; official API is extremely cheap ($0.14 cache miss, $0.0028 cached input). Start at https://openrouter.ai.

Is DeepSeek V4 Flash HIPAA / EU / on-prem friendly?

DeepSeek V4 Flash is not HIPAA-eligible, not available in an EU region, and offers open weights for self-hosting. Zero data retention is not available.

What is DeepSeek V4 Flash best for?

DeepSeek V4 Flash is best for cheap reasoning, cheap production, open weights, long context cheap. Trade-offs to be aware of: data-routing via China for hosted API; below top-20 on arena leaderboard.

Which tools and SDKs work with DeepSeek V4 Flash?

DeepSeek V4 Flash integrates with DeepSeek SDK, OpenAI-compatible API, OpenRouter, Ollama, vLLM, LangChain. Most major AI frameworks support it either natively or through OpenAI-compatible endpoints.