GPT-5.2: First Model Above 90% ARC-AGI Changes Inference Math

OpenAI's GPT-5.2 achieves 93.2% GPQA Diamond, 100% AIME, 70.9% GDPval. 400K context window drives new inference infrastructure requirements.

GPT-5.2: First Model Above 90% ARC-AGI Changes Inference Math

January 1, 2026

January 2026 Update: OpenAI released GPT-5.2 on December 11, 2025, achieving benchmark scores that redefine what's possible in professional knowledge work. The model beats human experts on 70.9% of GDPval tasks at 11x the speed and <1% the cost.


TL;DR

GPT-5.2 crosses critical capability thresholds: first model above 90% on ARC-AGI-1, perfect 100% on AIME 2025, and 40.3% on FrontierMath (10% improvement over 5.1). The 400K context window and 128K output tokens create new infrastructure demands. For inference providers, the 1.4x price increase signals OpenAI's confidence—and the compute intensity required to serve these capabilities.


What Happened

OpenAI launched GPT-5.2 on December 11, 2025, just 11 days after reportedly declaring "code red" in response to Google Gemini 3's benchmark dominance.1

The release includes two variants:

Variant Use Case Pricing (per 1M tokens)
GPT-5.2 General use $1.75 input / $14 output
GPT-5.2 Pro Extended reasoning Higher (xhigh reasoning tier)

Key specifications:2

  • Context window: 400,000 tokens
  • Max output: 128,000 tokens
  • Knowledge cutoff: August 31, 2025 (updated from Sep 2024)
  • Pricing: 1.4x GPT-5.1 cost

GPT-5.2 was built on Azure infrastructure using NVIDIA H100, H200, and GB200-NVL72 GPUs.3


Benchmark Performance

GPT-5.2 sets new records across professional, scientific, and mathematical benchmarks:4

Benchmark GPT-5.2 Score Previous Best Improvement
GPQA Diamond (PhD science) 93.2% 91.9% (Gemini 3) +1.3%
ARC-AGI-1 Verified >90% ~85% First above 90%
AIME 2025 (math) 100% 96.7% (Gemini 3) Perfect score
FrontierMath T1-3 40.3% 30% (GPT-5.1) +10%
GDPval (knowledge work) 70.9% Beats experts
SWE-Bench Pro (coding) 55.6% 51% (GPT-5.1) +4.6%
Tau2 Telecom (tool use) 98.7% ~95% Near-perfect

The GDPval result deserves attention: GPT-5.2 Thinking produced outputs at >11x speed and <1% cost compared to human expert professionals across 44 occupations.5


Why It Matters

Inference Demand Spike

The 400K context window requires substantial memory per request. A single inference with full context consumes significantly more GPU memory than previous 128K models. Providers must plan for:6

  • Memory scaling: 3x+ memory per request vs 128K context
  • Batch size reduction: Fewer concurrent requests per GPU
  • KV cache growth: Context length × batch size = massive KV cache requirements

Cost Structure Shift

The 1.4x price increase from GPT-5.1 reflects real compute intensity:7

Model Input Cost Output Cost Ratio to 5.1
GPT-5.1 $1.25/M $10/M 1.0x
GPT-5.2 $1.75/M $14/M 1.4x

For high-volume inference operations, this represents a 40% TCO increase for equivalent workloads.

Professional Work Automation

GPT-5.2's GDPval performance—beating experts on 70.9% of tasks at <1% cost—creates immediate demand for enterprise deployment. Organizations seeking these capabilities need inference infrastructure that can handle:8

  • Extended reasoning chains (Pro variant)
  • Long-context document processing
  • Reliable tool calling (98.7% Tau2)

Technical Details

Architecture

OpenAI hasn't disclosed specific architecture changes, but benchmark patterns suggest:9

  • Enhanced reasoning capabilities (FrontierMath +10%)
  • Improved long-context accuracy (256K token retrieval)
  • Better tool-use reliability (Tau2 98.7%)

Inference Requirements

Serving GPT-5.2 at scale requires consideration of:10

Factor GPT-5.1 GPT-5.2 Implication
Context window 200K 400K 2x memory per request
Max output 64K 128K 2x generation time
Reasoning depth Standard Extended (Pro) Variable latency
Tool calling 95% 98.7% More complex orchestration

Competitive Context

GPT-5.2 reclaims some benchmarks from Gemini 3 but not all:11

Benchmark Leader Score
GPQA Diamond Gemini 3 Deep Think 93.8%
AIME 2025 GPT-5.2 Thinking 100%
SWE-bench Verified Gemini 3 Pro 76.2%
Humanity's Last Exam Gemini 3 Leading
GDPval GPT-5.2 Thinking 70.9%

The rapid release cadence—GPT-5.2 just 11 days after Gemini 3—demonstrates the inference infrastructure pressure both companies face.


What's Next

Near-Term (Q1 2026)

  • GPT-5.2 Mini likely coming (no Mini variant at launch)
  • Enterprise API rollout expanding
  • Third-party inference providers adding support

Infrastructure Implications

Organizations planning GPT-5.2 deployments should:12

  1. Assess memory capacity: 400K context requires 3x+ memory vs 128K models
  2. Plan for KV cache: CXL memory expansion increasingly relevant
  3. Budget for compute: 1.4x cost increase is real
  4. Consider hybrid approaches: Route simpler tasks to cheaper models

For inference infrastructure deployment supporting frontier models, contact Introl.


References


  1. FlowHunt. "GPT 5.2 Launch and the AI Model Revolution." December 2025. https://www.flowhunt.io/blog/gpt-5-2-launch-ai-breakthroughs/ 

  2. LLM Stats. "GPT-5.2: Pricing, Context Window, Benchmarks." December 2025. https://llm-stats.com/models/gpt-5.2-2025-12-11 

  3. OpenAI. "Introducing GPT-5.2." December 11, 2025. https://openai.com/index/introducing-gpt-5-2/ 

  4. DataCamp. "GPT 5.2: Benchmarks, Model Breakdown, and Real-World Performance." December 2025. https://www.datacamp.com/blog/gpt-5-2 

  5. Vellum. "GPT-5.2 Benchmarks (Explained)." December 2025. https://www.vellum.ai/blog/gpt-5-2-benchmarks 

  6. Galaxy.ai. "GPT 5.2 Model Specs, Costs & Benchmarks." December 2025. https://blog.galaxy.ai/model/gpt-5-2 

  7. Simon Willison. "GPT-5.2." December 11, 2025. https://simonwillison.net/2025/Dec/11/gpt-52/ 

  8. OpenAI. "GPT-5.2 System Card." December 2025. https://cdn.openai.com/pdf/3a4153c8-c748-4b71-8e31-aecbde944f8d/oai_5_2_system-card.pdf 

  9. OpenAI. "Introducing GPT-5.2-Codex." December 2025. https://openai.com/index/introducing-gpt-5-2-codex/ 

  10. IntuitionLabs. "Latest AI Research (Dec 2025): GPT-5, Agents & Trends." December 2025. https://intuitionlabs.ai/articles/latest-ai-research-trends-2025 

  11. LM Council. "AI Model Benchmarks Dec 2025." December 2025. https://lmcouncil.ai/benchmarks 

  12. Vertu. "AI Model Releases Nov/Dec 2025: Benchmarks & Comparison." December 2025. https://vertu.com/lifestyle/the-ai-model-race-reaches-singularity-speed/ 

Request a Quote_

Tell us about your project and we'll respond within 72 hours.

> TRANSMISSION_COMPLETE

Request Received_

Thank you for your inquiry. Our team will review your request and respond within 72 hours.

QUEUED FOR PROCESSING