Gemini 3 Flash: Google's Speed Champion Matches GPT-5.2 at 6x Lower Cost
TL;DR
Google launched Gemini 3 Flash on December 17, 2025, delivering frontier-class performance at Flash-level speed and cost. The model achieves 90.4% on GPQA Diamond and 78% on SWE-bench Verified while costing just $0.50 per million input tokens, roughly 6x cheaper than Claude Opus 4.5. For inference-heavy deployments, Gemini 3 Flash processes 218 tokens per second, outperforming GPT-5.1 (125 t/s) and DeepSeek V3.2 reasoning mode (30 t/s).
What Happened
Google released Gemini 3 Flash on December 17, 2025, one month after Gemini 3 Pro topped the LMArena leaderboard. The model combines Pro-grade reasoning with Flash-level latency and efficiency, targeting high-volume production workloads where cost and speed matter as much as capability.
Gemini 3 Flash immediately became the default model in the Gemini app and AI Mode in Google Search, signaling Google's confidence in deploying frontier intelligence at consumer scale.
The model outperforms Gemini 2.5 Pro across benchmarks while running 3x faster according to Artificial Analysis testing. In several benchmarks, it trades blows with GPT-5.2, the model OpenAI rushed out to counter Gemini 3 Pro.
Companies including JetBrains, Figma, Cursor, Harvey, and Latitude already use Gemini 3 Flash in production.
Why It Matters
The inference cost equation for AI applications just shifted. Gemini 3 Flash offers frontier-class reasoning at commodity pricing, creating new deployment economics for data center operators and application developers.
Cost Advantage: At $0.50 per million input tokens, Gemini 3 Flash costs 6x less than Claude Opus 4.5 ($3.00) while achieving comparable performance on most benchmarks. Context caching enables 90% cost reductions for workloads with repeated token use.
Inference Speed: Artificial Analysis benchmarking recorded 218 output tokens per second, outpacing GPT-5.1 (125 t/s) by 74% and DeepSeek V3.2 reasoning mode (30 t/s) by 7x. Sub-second latency for short prompts enables responsive chat interfaces and rapid agentic loop iterations.
Agentic Workflows: The model achieved 78% on SWE-bench Verified, outperforming both the 2.5 series and Gemini 3 Pro for agentic coding tasks. For enterprises building AI agents, comparable capability at lower cost directly impacts deployment ROI.
Multimodal Processing: Resemble AI reported 4x faster multimodal analysis compared to 2.5 Pro, processing raw technical outputs without workflow bottlenecks.
Technical Details
Specifications
| Specification | Gemini 3 Flash |
|---|---|
| Input Modalities | Text, image, video, audio, PDF |
| Output Modalities | Text |
| Max Input Tokens | 1,048,576 (1M) |
| Max Output Tokens | 65,536 |
| Knowledge Cutoff | January 2025 |
| Release Date | December 17, 2025 |
Benchmark Performance
| Benchmark | Gemini 3 Flash | Gemini 3 Pro | GPT-5.2 | Claude Opus 4.5 |
|---|---|---|---|---|
| GPQA Diamond | 90.4% | 91.9% | 88.4% | 88.0% |
| SWE-bench Verified | 78% | 76.2% | — | 80.9% |
| MMMU-Pro | 81.2% | — | 79.5% | — |
| Humanity's Last Exam | 33.7% | — | — | — |
| LMArena Elo | — | 1501 | — | — |
Gemini 3 Flash surpasses 2.5 Flash across the board and significantly outperforms 2.5 Pro on several benchmarks while matching or beating 3 Pro in areas including MMMU Pro, Toolathlon, and MPC Atlas.
Pricing Comparison
| Model | Input (per 1M tokens) | Output (per 1M tokens) |
|---|---|---|
| Gemini 3 Flash | $0.50 | $3.00 |
| Gemini 2.5 Flash | $0.30 | $2.50 |
| Gemini 3 Pro | ~$2.00 | ~$10.00 |
| Claude Opus 4.5 | $3.00 | $15.00 |
| GPT-5.2 | ~$2.50 | ~$10.00 |
Gemini 3 Flash costs less than a quarter of Gemini 3 Pro while delivering comparable reasoning capability. The Batch API offers 50% additional savings for asynchronous processing with higher rate limits.
Speed Metrics
| Model | Output Tokens/Second |
|---|---|
| Gemini 3 Flash | 218 |
| Gemini 2.5 Flash | ~280 |
| GPT-5.1 High | 125 |
| DeepSeek V3.2 Reasoning | 30 |
Gemini 3 Flash runs 22% slower than 2.5 Flash but significantly faster than competing frontier models, making it the speed leader among reasoning-capable systems.
What's Next
Gemini 3 Flash rolls out now across Google AI Studio, Gemini CLI, Android Studio, and Vertex AI for enterprise deployments. The model remains in preview status as Google gathers production feedback.
For model selection in December 2025: - Long coding sessions and bug fixing: Claude Opus 4.5 leads at 80.9% SWE-bench - Algorithm design and competitive programming: Gemini 3 Pro dominates with 2,439 LiveCodeBench Elo - High-volume inference at low cost: Gemini 3 Flash offers the best quality-per-dollar - Pure reasoning and math: GPT-5.2 achieves 100% on AIME 2025
The Artificial Analysis comparison shows Gemini 3 Flash with an Intelligence Index score of 71.3 versus Claude Sonnet 4.5's 62.8, combined with 3x faster response times and 4x better output speed.
Introl Angle
High-throughput AI inference workloads demand GPU infrastructure optimized for consistent low-latency performance. Introl's network of 550 field engineers deploy and maintain accelerator clusters across 257 global locations. Learn more about our coverage area.
Published: December 29, 2025