DeepSeek V4 Flash
DeepSeek V4 Flash is a DeepSeek model from DeepSeek, released in 2026-04. It costs $0.14 / $0.28 per 1M, has a 1M-token context window, and is best for cheap-reasoning, cheap-production, open-weights. Last verified 2026-05-06.
Spec sheet
Pricing
- Input
- $0.14 / 1M
- Output
- $0.28 / 1M
- Cached input
- $0.0028 / 1M
- Free tier
- OpenRouter
Context & speed
- Context window
- 1M tokens
- Max output
- 384k tokens
- Throughput
- ~95 tok/s
- Time to first token
- ~650 ms
- Speed tier
- balanced
Capabilities
- Tool use
- Yes
- Structured output
- Yes
- Prompt caching
- Yes
- Extended thinking
- Yes
- Vision input
- No
- Audio in / out
- No
- Fine-tuning
- Yes
Deployment
- Open weights
- Yes
- On-prem
- Yes
- HIPAA eligible
- No
- Zero retention
- No
- Regions
- apac, us
Estimated monthly cost
Assumes typical token shape: 2k input, 600 output per call. Prompt caching is excluded from these figures.
When to use DeepSeek V4 Flash
Sweet spot
- cheap reasoning
- cheap production
- open weights
- long context cheap
Known trade-offs
- data-routing via China for hosted API
- below top-20 on arena leaderboard
Works with
Compare DeepSeek V4 Flash to other models
FAQ — DeepSeek V4 Flash
How much does DeepSeek V4 Flash cost?
DeepSeek V4 Flash costs $0.14 / $0.28 per 1M tokens on the DeepSeek API. Cached input reads cost $0.0028 per 1M, cutting the input bill by roughly 98% on repeat system prompts.
What is the context window of DeepSeek V4 Flash?
DeepSeek V4 Flash has a 1M-token context window with up to 384k tokens of output. That's enough for entire codebases, long transcripts, or multi-document RAG.
Does DeepSeek V4 Flash have a free tier?
Yes — Often available free via OpenRouter; official API is extremely cheap ($0.14 cache miss, $0.0028 cached input). Start at https://openrouter.ai.
Is DeepSeek V4 Flash HIPAA / EU / on-prem friendly?
DeepSeek V4 Flash is not HIPAA-eligible, not available in an EU region, and offers open weights for self-hosting. Zero data retention is not available.
What is DeepSeek V4 Flash best for?
DeepSeek V4 Flash is best for cheap reasoning, cheap production, open weights, long context cheap. Trade-offs to be aware of: data-routing via China for hosted API; below top-20 on arena leaderboard.
Which tools and SDKs work with DeepSeek V4 Flash?
DeepSeek V4 Flash integrates with DeepSeek SDK, OpenAI-compatible API, OpenRouter, Ollama, vLLM, LangChain. Most major AI frameworks support it either natively or through OpenAI-compatible endpoints.