Gemini 3 Flash: Google's Speed Champion Matches GPT-5.2 at 6x Lower Cost

Google's Gemini 3 Flash delivers 90.4% GPQA Diamond and 78% SWE-bench at $0.50/M tokens. What the fastest frontier model means for AI infrastructure.

Gemini 3 Flash: Google's Speed Champion Matches GPT-5.2 at 6x Lower Cost

Gemini 3 Flash: Google's Speed Champion Matches GPT-5.2 at 6x Lower Cost

TL;DR

Google launched Gemini 3 Flash on December 17, 2025, delivering frontier-class performance at Flash-level speed and cost. The model achieves 90.4% on GPQA Diamond and 78% on SWE-bench Verified while costing just $0.50 per million input tokens, roughly 6x cheaper than Claude Opus 4.5. For inference-heavy deployments, Gemini 3 Flash processes 218 tokens per second, outperforming GPT-5.1 (125 t/s) and DeepSeek V3.2 reasoning mode (30 t/s).


What Happened

Google released Gemini 3 Flash on December 17, 2025, one month after Gemini 3 Pro topped the LMArena leaderboard. The model combines Pro-grade reasoning with Flash-level latency and efficiency, targeting high-volume production workloads where cost and speed matter as much as capability.

Gemini 3 Flash immediately became the default model in the Gemini app and AI Mode in Google Search, signaling Google's confidence in deploying frontier intelligence at consumer scale.

The model outperforms Gemini 2.5 Pro across benchmarks while running 3x faster according to Artificial Analysis testing. In several benchmarks, it trades blows with GPT-5.2, the model OpenAI rushed out to counter Gemini 3 Pro.

Companies including JetBrains, Figma, Cursor, Harvey, and Latitude already use Gemini 3 Flash in production.


Why It Matters

The inference cost equation for AI applications just shifted. Gemini 3 Flash offers frontier-class reasoning at commodity pricing, creating new deployment economics for data center operators and application developers.

Cost Advantage: At $0.50 per million input tokens, Gemini 3 Flash costs 6x less than Claude Opus 4.5 ($3.00) while achieving comparable performance on most benchmarks. Context caching enables 90% cost reductions for workloads with repeated token use.

Inference Speed: Artificial Analysis benchmarking recorded 218 output tokens per second, outpacing GPT-5.1 (125 t/s) by 74% and DeepSeek V3.2 reasoning mode (30 t/s) by 7x. Sub-second latency for short prompts enables responsive chat interfaces and rapid agentic loop iterations.

Agentic Workflows: The model achieved 78% on SWE-bench Verified, outperforming both the 2.5 series and Gemini 3 Pro for agentic coding tasks. For enterprises building AI agents, comparable capability at lower cost directly impacts deployment ROI.

Multimodal Processing: Resemble AI reported 4x faster multimodal analysis compared to 2.5 Pro, processing raw technical outputs without workflow bottlenecks.


Technical Details

Specifications

Specification Gemini 3 Flash
Input Modalities Text, image, video, audio, PDF
Output Modalities Text
Max Input Tokens 1,048,576 (1M)
Max Output Tokens 65,536
Knowledge Cutoff January 2025
Release Date December 17, 2025

Benchmark Performance

Benchmark Gemini 3 Flash Gemini 3 Pro GPT-5.2 Claude Opus 4.5
GPQA Diamond 90.4% 91.9% 88.4% 88.0%
SWE-bench Verified 78% 76.2% 80.9%
MMMU-Pro 81.2% 79.5%
Humanity's Last Exam 33.7%
LMArena Elo 1501

Gemini 3 Flash surpasses 2.5 Flash across the board and significantly outperforms 2.5 Pro on several benchmarks while matching or beating 3 Pro in areas including MMMU Pro, Toolathlon, and MPC Atlas.

Pricing Comparison

Model Input (per 1M tokens) Output (per 1M tokens)
Gemini 3 Flash $0.50 $3.00
Gemini 2.5 Flash $0.30 $2.50
Gemini 3 Pro ~$2.00 ~$10.00
Claude Opus 4.5 $3.00 $15.00
GPT-5.2 ~$2.50 ~$10.00

Gemini 3 Flash costs less than a quarter of Gemini 3 Pro while delivering comparable reasoning capability. The Batch API offers 50% additional savings for asynchronous processing with higher rate limits.

Speed Metrics

Model Output Tokens/Second
Gemini 3 Flash 218
Gemini 2.5 Flash ~280
GPT-5.1 High 125
DeepSeek V3.2 Reasoning 30

Gemini 3 Flash runs 22% slower than 2.5 Flash but significantly faster than competing frontier models, making it the speed leader among reasoning-capable systems.


What's Next

Gemini 3 Flash rolls out now across Google AI Studio, Gemini CLI, Android Studio, and Vertex AI for enterprise deployments. The model remains in preview status as Google gathers production feedback.

For model selection in December 2025: - Long coding sessions and bug fixing: Claude Opus 4.5 leads at 80.9% SWE-bench - Algorithm design and competitive programming: Gemini 3 Pro dominates with 2,439 LiveCodeBench Elo - High-volume inference at low cost: Gemini 3 Flash offers the best quality-per-dollar - Pure reasoning and math: GPT-5.2 achieves 100% on AIME 2025

The Artificial Analysis comparison shows Gemini 3 Flash with an Intelligence Index score of 71.3 versus Claude Sonnet 4.5's 62.8, combined with 3x faster response times and 4x better output speed.


Introl Angle

High-throughput AI inference workloads demand GPU infrastructure optimized for consistent low-latency performance. Introl's network of 550 field engineers deploy and maintain accelerator clusters across 257 global locations. Learn more about our coverage area.


Published: December 29, 2025

Request a Quote_

Tell us about your project and we'll respond within 72 hours.

> TRANSMISSION_COMPLETE

Request Received_

Thank you for your inquiry. Our team will review your request and respond within 72 hours.

QUEUED FOR PROCESSING