GPT-5.2: First Model Above 90% ARC-AGI Changes Inference Math

OpenAI's GPT-5.2 achieves 93.2% GPQA Diamond, 100% AIME, 70.9% GDPval. 400K context window drives new inference infrastructure requirements.

Blake Crosley

Jan 02, 2026 4 min read Disclaimer

GPT-5.2: First Model Above 90% ARC-AGI Changes Inference Math

January 1, 2026

January 2026 Update: OpenAI released GPT-5.2 on December 11, 2025, achieving benchmark scores that redefine what's possible in professional knowledge work. The model beats human experts on 70.9% of GDPval tasks at 11x the speed and <1% the cost.

TL;DR

GPT-5.2 crosses critical capability thresholds: first model above 90% on ARC-AGI-1, perfect 100% on AIME 2025, and 40.3% on FrontierMath (10% improvement over 5.1). The 400K context window and 128K output tokens create new infrastructure demands. For inference providers, the 1.4x price increase signals OpenAI's confidence—and the compute intensity required to serve these capabilities.

What Happened

OpenAI launched GPT-5.2 on December 11, 2025, just 11 days after reportedly declaring "code red" in response to Google Gemini 3's benchmark dominance.¹

The release includes two variants:

Variant	Use Case	Pricing (per 1M tokens)
GPT-5.2	General use	$1.75 input / $14 output
GPT-5.2 Pro	Extended reasoning	Higher (xhigh reasoning tier)

Key specifications:²

Context window: 400,000 tokens
Max output: 128,000 tokens
Knowledge cutoff: August 31, 2025 (updated from Sep 2024)
Pricing: 1.4x GPT-5.1 cost

GPT-5.2 was built on Azure infrastructure using NVIDIA H100, H200, and GB200-NVL72 GPUs.³

Benchmark Performance

GPT-5.2 sets new records across professional, scientific, and mathematical benchmarks:⁴

Benchmark	GPT-5.2 Score	Previous Best	Improvement
GPQA Diamond (PhD science)	93.2%	91.9% (Gemini 3)	+1.3%
ARC-AGI-1 Verified	>90%	~85%	First above 90%
AIME 2025 (math)	100%	96.7% (Gemini 3)	Perfect score
FrontierMath T1-3	40.3%	30% (GPT-5.1)	+10%
GDPval (knowledge work)	70.9%	—	Beats experts
SWE-Bench Pro (coding)	55.6%	51% (GPT-5.1)	+4.6%
Tau2 Telecom (tool use)	98.7%	~95%	Near-perfect

The GDPval result deserves attention: GPT-5.2 Thinking produced outputs at >11x speed and <1% cost compared to human expert professionals across 44 occupations.⁵

Why It Matters

Inference Demand Spike

The 400K context window requires substantial memory per request. A single inference with full context consumes significantly more GPU memory than previous 128K models. Providers must plan for:⁶

Memory scaling: 3x+ memory per request vs 128K context
Batch size reduction: Fewer concurrent requests per GPU
KV cache growth: Context length × batch size = massive KV cache requirements

Cost Structure Shift

The 1.4x price increase from GPT-5.1 reflects real compute intensity:⁷

Model	Input Cost	Output Cost	Ratio to 5.1
GPT-5.1	$1.25/M	$10/M	1.0x
GPT-5.2	$1.75/M	$14/M	1.4x

For high-volume inference operations, this represents a 40% TCO increase for equivalent workloads.

Professional Work Automation

GPT-5.2's GDPval performance—beating experts on 70.9% of tasks at <1% cost—creates immediate demand for enterprise deployment. Organizations seeking these capabilities need inference infrastructure that can handle:⁸

Extended reasoning chains (Pro variant)
Long-context document processing
Reliable tool calling (98.7% Tau2)

Technical Details

Architecture

OpenAI hasn't disclosed specific architecture changes, but benchmark patterns suggest:⁹

Enhanced reasoning capabilities (FrontierMath +10%)
Improved long-context accuracy (256K token retrieval)
Better tool-use reliability (Tau2 98.7%)

Inference Requirements

Serving GPT-5.2 at scale requires consideration of:¹⁰

Factor	GPT-5.1	GPT-5.2	Implication
Context window	200K	400K	2x memory per request
Max output	64K	128K	2x generation time
Reasoning depth	Standard	Extended (Pro)	Variable latency
Tool calling	95%	98.7%	More complex orchestration

Competitive Context

GPT-5.2 reclaims some benchmarks from Gemini 3 but not all:¹¹

Benchmark	Leader	Score
GPQA Diamond	Gemini 3 Deep Think	93.8%
AIME 2025	GPT-5.2 Thinking	100%
SWE-bench Verified	Gemini 3 Pro	76.2%
Humanity's Last Exam	Gemini 3	Leading
GDPval	GPT-5.2 Thinking	70.9%

The rapid release cadence—GPT-5.2 just 11 days after Gemini 3—demonstrates the inference infrastructure pressure both companies face.

What's Next

Near-Term (Q1 2026)

GPT-5.2 Mini likely coming (no Mini variant at launch)
Enterprise API rollout expanding
Third-party inference providers adding support

Infrastructure Implications

Organizations planning GPT-5.2 deployments should:¹²

Assess memory capacity: 400K context requires 3x+ memory vs 128K models
Plan for KV cache: CXL memory expansion increasingly relevant
Budget for compute: 1.4x cost increase is real
Consider hybrid approaches: Route simpler tasks to cheaper models

For inference infrastructure deployment supporting frontier models, contact Introl.

References

FlowHunt. "GPT 5.2 Launch and the AI Model Revolution." December 2025. https://www.flowhunt.io/blog/gpt-5-2-launch-ai-breakthroughs/ ↩
LLM Stats. "GPT-5.2: Pricing, Context Window, Benchmarks." December 2025. https://llm-stats.com/models/gpt-5.2-2025-12-11 ↩
OpenAI. "Introducing GPT-5.2." December 11, 2025. https://openai.com/index/introducing-gpt-5-2/ ↩
DataCamp. "GPT 5.2: Benchmarks, Model Breakdown, and Real-World Performance." December 2025. https://www.datacamp.com/blog/gpt-5-2 ↩
Vellum. "GPT-5.2 Benchmarks (Explained)." December 2025. https://www.vellum.ai/blog/gpt-5-2-benchmarks ↩
Galaxy.ai. "GPT 5.2 Model Specs, Costs & Benchmarks." December 2025. https://blog.galaxy.ai/model/gpt-5-2 ↩
Simon Willison. "GPT-5.2." December 11, 2025. https://simonwillison.net/2025/Dec/11/gpt-52/ ↩
OpenAI. "GPT-5.2 System Card." December 2025. https://cdn.openai.com/pdf/3a4153c8-c748-4b71-8e31-aecbde944f8d/oai_5_2_system-card.pdf ↩
OpenAI. "Introducing GPT-5.2-Codex." December 2025. https://openai.com/index/introducing-gpt-5-2-codex/ ↩
IntuitionLabs. "Latest AI Research (Dec 2025): GPT-5, Agents & Trends." December 2025. https://intuitionlabs.ai/articles/latest-ai-research-trends-2025 ↩
LM Council. "AI Model Benchmarks Dec 2025." December 2025. https://lmcouncil.ai/benchmarks ↩
Vertu. "AI Model Releases Nov/Dec 2025: Benchmarks & Comparison." December 2025. https://vertu.com/lifestyle/the-ai-model-race-reaches-singularity-speed/ ↩