World Models Race 2026: How LeCun, DeepMind, and World Labs Are Redefining the Path to AGI

Yann LeCun raises €500M for AMI Labs while DeepMind's Genie 3 simulates real-time 3D worlds. The 2026 race to build AI that understands physics may eclipse LLMs.

Blake Crosley

Jan 03, 2026 16 min read Disclaimer

World Models Race 2026: How LeCun, DeepMind, and World Labs Are Redefining the Path to AGI

Three billion dollars in pre-launch valuation for a startup that has not released a single product.¹ Yann LeCun's AMI Labs represents the largest bet yet on a thesis that has divided AI researchers for years: large language models will never achieve general intelligence, and the path forward runs through world models instead.

TL;DR

The world models paradigm exploded into mainstream AI development in late 2025 and early 2026. Yann LeCun left Meta after 12 years to launch AMI Labs, raising €500M at a €3B valuation to build AI systems that understand physics rather than just predicting text.² Google DeepMind released Genie 3, the first real-time interactive world model capable of generating persistent 3D environments at 24 fps.³ Fei-Fei Li's World Labs launched Marble, making world model generation commercially available with pricing from free to $95/month.⁴ NVIDIA's Cosmos platform has seen 2 million downloads as robotics and autonomous vehicle developers embrace synthetic physics-aware training data.⁵ For organizations building AI infrastructure, world models signal a computational shift from text processing toward video generation, physics simulation, and embodied reasoning.

The LLM Ceiling

Large language models achieved remarkable capabilities through scale. GPT-4, Claude, and Gemini demonstrate sophisticated reasoning, code generation, and multi-step problem solving.⁶ Yet a fundamental limitation persists: these models learn statistical patterns from text, not understanding of physical reality.⁷

Research published in 2024 proved mathematically that LLMs cannot learn all computable functions and will therefore inevitably hallucinate when used as general problem solvers.⁸ The root cause lies in how LLMs operate: predicting which tokens follow previous tokens based on patterns learned from training data, without any grounding in physical reality.⁹

The Hallucination Problem

LLMs generate plausible-sounding text that may describe physically impossible scenarios, historically inaccurate events, or logically inconsistent reasoning.¹⁰ Unlike humans who learn about gravity through embodied experience, LLMs only learn that the word "gravity" tends to appear near certain other words.¹¹

Limitation	Cause	Consequence
Factual hallucination	No verified knowledge base¹²	Confident fabrication of facts
Physical reasoning failure	No embodied experience¹³	Describes impossible physics
Causal confusion	Pattern matching, not understanding¹⁴	Correlation treated as causation
Temporal incoherence	Sequential token prediction¹⁵	Events in impossible order

Yann LeCun has argued publicly for years that scaling LLMs will not produce general intelligence.¹⁶ "LLMs are too limiting," LeCun stated in his NVIDIA GTC presentation. "Scaling them up will not allow us to reach AGI."¹⁷

The alternative he proposes: world models that learn representations of physical reality, enabling prediction, planning, and reasoning about cause and effect.¹⁸

Yann LeCun's AMI Labs

LeCun departed Meta in December 2025 after 12 years, five as founding director of Facebook AI Research (FAIR) and seven as chief AI scientist.¹⁹ His new venture, Advanced Machine Intelligence (AMI) Labs, represents the most ambitious attempt yet to commercialize world model research.²⁰

Funding and Structure

AMI Labs entered funding discussions seeking €500 million at a €3 billion valuation before launching any product.²¹ The target would represent one of the largest pre-launch raises in AI history, reflecting investor confidence in LeCun's vision and track record.²²

Role	Person	Background
Executive Chairman	Yann LeCun	Turing Award winner, Meta FAIR founder²³
CEO	Alex LeBrun	Former CEO of Nabla (medical AI)²⁴

The company plans to establish headquarters in Paris by January 2026.²⁵ While Meta will not invest directly in AMI Labs, the companies plan to forge a partnership allowing LeCun to continue research connections.²⁶

Technical Vision

AMI Labs aims to create AI systems that understand physics, maintain persistent memory, and plan complex actions rather than simply predicting text sequences.²⁷ LeCun describes a world model as "your mental model of how the world behaves."²⁸

"You can imagine a sequence of actions you might take, and your world model will allow you to predict what the effect of the sequence of actions will be on the world," LeCun explained.²⁹

The approach differs fundamentally from LLMs. Where GPT-style models predict the next word, world models predict the next state of a physical environment given actions taken within it.³⁰ This enables:

Planning: Simulating outcomes before taking action
Reasoning about physics: Understanding that objects have mass, momentum, and spatial relationships
Cause-effect understanding: Learning that actions produce predictable consequences
Persistent memory: Maintaining consistent world state across time

I-JEPA Foundation

AMI Labs builds on LeCun's I-JEPA (Image Joint Embedding Predictive Architecture) research at Meta.³¹ I-JEPA learns by predicting representations of image regions from other regions, developing abstract understanding of visual scenes without needing explicit labels.³²

The approach parallels how humans develop intuitive physics through observation. A child watching objects fall develops an internal model of gravity without anyone explaining Newton's laws.³³ I-JEPA and successor architectures aim to replicate this learning process in artificial systems.³⁴

DeepMind's Genie 3

Google DeepMind released Genie 3 in August 2025, representing the first real-time interactive general-purpose world model.³⁵ Unlike previous systems that generated static environments or required significant processing time, Genie 3 produces navigable 3D worlds at 24 frames per second.³⁶

Technical Capabilities

Genie 3 generates dynamic environments from text prompts, maintaining visual consistency for several minutes of real-time interaction.³⁷ The system does not rely on hard-coded physics engines; instead, the model teaches itself how the world works through training.³⁸

Capability	Specification
Frame rate	24 fps real-time³⁹
Resolution	720p⁴⁰
Consistency duration	Several minutes⁴¹
Memory horizon	Up to 1 minute lookback⁴²
Physics	Self-learned, not hard-coded⁴³

"Genie 3 is the first real-time interactive general-purpose world model," stated Shlomi Fruchter, research director at DeepMind. "It goes beyond narrow world models that existed before. It's not specific to any particular environment."⁴⁴

Auto-Regressive Architecture

The model generates one frame at a time, looking back at previously generated content to determine what happens next.⁴⁵ Achieving real-time performance requires computing this auto-regressive process multiple times per second while maintaining consistency with potentially minute-old visual memory.⁴⁶

Physical consistency emerges from training rather than explicit programming.⁴⁷ Genie 3 environments maintain stable physics because the model learned physical regularities from training data, not because researchers manually encoded gravity or collision detection.⁴⁸

AGI Implications

DeepMind positions Genie 3 as a stepping stone toward artificial general intelligence.⁴⁹ The lab expects world model technology to play a critical role as AI agents interact more with physical environments.⁵⁰

"Genie 3 marks a major leap toward Artificial General Intelligence by enabling AI agents to 'experience,' interact with, and learn from richly simulated worlds without manual content creation," according to DeepMind's announcement.⁵¹

Current Limitations

Genie 3 remains in limited research preview rather than public release.⁵² Known constraints include:

Limited action space for agent interactions
Consistency breakdown after several minutes
Incomplete real-world geographic accuracy
Challenges modeling complex multi-agent interactions

DeepMind continues expanding testing access to selected academics and creators.⁵³

Fei-Fei Li's World Labs and Marble

World Labs, founded by AI pioneer Fei-Fei Li, launched Marble in November 2025 as the first commercially available world model product.⁵⁴ The startup emerged from stealth with $230 million in funding just over a year before the Marble launch.⁵⁵

Product Architecture

Marble generates persistent, downloadable 3D environments from text prompts, photos, videos, 3D layouts, or panoramic images.⁵⁶ Unlike competitors that generate worlds on-the-fly during exploration, Marble produces discrete environments that users can edit and export.⁵⁷

Input Type	Output
Text prompt	3D environment
Photo	3D environment
Video	3D environment
3D layout	AI-enhanced 3D environment
Panorama	3D environment

The platform offers AI-native editing tools and a hybrid 3D editor enabling spatial structure blocking before AI fills visual details.⁵⁸ Files export in formats compatible with industry-standard tools like Unreal Engine and Unity.⁵⁹

Pricing Model

World Labs adopted a freemium structure targeting creative professionals:⁶⁰

Tier	Price	Generations	Features
Free	$0	4/month	Basic generation
Standard	$20/month	12/month	Standard features
Pro	$35/month	25/month	Commercial rights
Max	$95/month	75/month	Premium features

Target Applications

Initial use cases focus on gaming, visual effects for film, and virtual reality.⁶¹ Marble supports Vision Pro and Quest 3 VR headsets, with every generated world viewable in VR.⁶²

Fei-Fei Li positions Marble as "the first step toward creating a truly spatially intelligent world model."⁶³ Beyond creative applications, the technology enables robotics training through simulated environments that would be expensive or dangerous to create in physical reality.⁶⁴

NVIDIA Cosmos: Industrial-Scale World Models

NVIDIA launched Cosmos at CES 2025 as a platform for physical AI development, specifically targeting autonomous vehicles and robotics.⁶⁵ By January 2026, Cosmos world foundation models had been downloaded over 2 million times.⁶⁶

Platform Architecture

Cosmos comprises generative world foundation models, advanced tokenizers, guardrails, and an accelerated video processing pipeline.⁶⁷ The models predict and generate physics-aware videos of future environment states, enabling synthetic training data generation at massive scale.⁶⁸

Model Tier	Optimization	Use Case
Nano	Real-time, edge deployment⁶⁹	On-device inference
Super	High performance baseline⁷⁰	General development
Ultra	Maximum quality and fidelity⁷¹	Custom model distillation

The platform trained on 9,000 trillion tokens from 20 million hours of real-world data spanning human interactions, environments, industrial settings, robotics, and driving scenarios.⁷²

Industry Adoption

Leading robotics and automotive companies adopted Cosmos for synthetic data generation:⁷³

Company	Domain
1X	Humanoid robots
Agility	Bipedal robots
Figure AI	Humanoid robots
Waabi	Autonomous trucking
XPENG	Electric vehicles
Uber	Ridesharing autonomous

Cosmos Model Types

Three model types address different physical AI development needs:⁷⁴

Cosmos-Predict: Simulates and predicts future world states in video form Cosmos-Transfer: Produces high-quality simulations conditioned on spatial control inputs Cosmos-Reason: Reasoning model for physical AI development

NVIDIA released the reasoning model as open and fully customizable, enabling developers to generate diverse training data using text, image, and video prompts.⁷⁵

Video Generation as World Simulation

The distinction between video generation and world models has blurred as leading video systems incorporate physics understanding. OpenAI describes Sora as teaching "AI to understand and simulate the physical world in motion."⁷⁶

Sora 2 Progress

OpenAI released Sora 2 as a significant advancement in physical understanding.⁷⁷ Where previous video models "morphed objects and deformed reality" to execute prompts, Sora 2 demonstrates physics compliance. A missed basketball shot rebounds off the backboard rather than teleporting to the hoop.⁷⁸

"The model's 'mistakes' often appear to be mistakes of the internal agent being modeled," OpenAI noted, indicating the system simulates agents operating within physical constraints rather than generating arbitrary visual sequences.⁷⁹

Runway's World Models Approach

Runway's Gen-4.5, released in December 2025, claimed the top position on the Video Arena benchmark, outperforming Google's Veo 3 and OpenAI's Sora 2 Pro.⁸⁰ Runway explicitly frames Gen-4.5 as moving beyond "video generation" toward "world models that understand physics."⁸¹

"Objects move with realistic weight, momentum and force. Liquids flow with proper dynamics," Runway stated.⁸² The company positions Gen-4.5 as a step toward "General World Models" that simulate environments including their physics.⁸³

Competitive Landscape

Model	Company	Benchmark Position	Physics Focus
Gen-4.5	Runway	#1 Video Arena⁸⁴	Explicit world model framing
Veo 3	Google	#2 Video Arena⁸⁵	Video generation with physics
Sora 2 Pro	OpenAI	#7 Video Arena⁸⁶	World simulation research
Genie 3	DeepMind	N/A (different focus)⁸⁷	Real-time interaction

Applications Beyond Entertainment

World models address critical limitations in training embodied AI systems. Robotics and autonomous vehicles require understanding of physics that cannot be learned from text alone.⁸⁸

Robotics Training

Physical robots benefit from training in simulated environments before deployment.⁸⁹ World models generate diverse scenarios that would be impractical or dangerous to create in reality. A warehouse robot can experience millions of package-handling scenarios in simulation, including edge cases that rarely occur in physical warehouses.⁹⁰

NVIDIA's Cosmos enables developers to "generate diverse data for training robots at scale using text, image and video prompts."⁹¹ This synthetic data addresses a fundamental challenge in robotics: unlike language models that can train on internet-scale text, robots have limited physical training data available.⁹²

Autonomous Vehicles

Autonomous vehicle development requires exposure to scenarios that occur rarely in real driving but must be handled correctly when encountered.⁹³ World models enable generation of:

Near-miss collision scenarios
Unusual weather conditions
Pedestrian behaviors in edge cases
Construction zone configurations
Emergency vehicle interactions

World models serve as "learned simulators" or mental "what if" thought experiments for model-based reinforcement learning.⁹⁴ By incorporating world models into driving systems, developers enable vehicles to understand human decisions and generalize to real-world situations.⁹⁵

Scientific Simulation

World models promise impact beyond robotics and vehicles. Applications include:⁹⁶

Molecular structure simulation in chemistry
Physical law modeling in physics
Climate system prediction
Medical procedure training
Manufacturing process optimization

Organizations deploying AI infrastructure for world model development can consult Introl for GPU deployment strategies across 257 global locations with 100,000 GPU capability.

Infrastructure Requirements

World models demand different computational profiles than large language models. Video generation and physics simulation require substantially more compute per inference than text generation.⁹⁷

GPU Requirements

World model training involves video data rather than text, dramatically increasing memory and compute requirements.⁹⁸ A single high-quality video frame contains orders of magnitude more information than a text token. Training on 20 million hours of video, as NVIDIA's Cosmos did, requires infrastructure beyond what most organizations can deploy independently.⁹⁹

Workload	Typical GPU Requirement
LLM inference	1-8 GPUs per request
World model inference	8-32 GPUs per request
LLM training	Hundreds to thousands
World model training	Thousands to tens of thousands

Memory Bandwidth

Real-time world model inference at 24 fps requires rapid memory access to maintain consistency with previously generated frames.¹⁰⁰ High-bandwidth memory (HBM) GPUs like NVIDIA H200 and B200 offer advantages for workloads that must repeatedly access large visual context windows.¹⁰¹

Storage Considerations

Video training data consumes storage at rates far exceeding text corpora. A single hour of high-quality video may exceed 100GB uncompressed.¹⁰² Organizations building world model training infrastructure must plan for petabyte-scale storage with high-throughput access patterns.¹⁰³

The AGI Debate

The world models approach represents a philosophical divergence from the scaling hypothesis that drove LLM development.¹⁰⁴ Proponents argue that text prediction cannot produce genuine understanding, while critics question whether learned physics simulations will generalize to novel situations.¹⁰⁵

The LeCun Position

LeCun argues that LLMs represent a dead end for AGI because they lack grounding in physical reality.¹⁰⁶ Text-only training produces systems that can discuss physics without understanding physics, describe spatial relationships without perceiving space, and reason about causation without experiencing cause and effect.¹⁰⁷

World models, by contrast, learn representations from sensory data and forecast dynamics like motion, force, and spatial relationships.¹⁰⁸ This grounding potentially enables robust generalization that text-trained systems cannot achieve.¹⁰⁹

The Scaling Counter-Argument

Some researchers maintain that sufficient scale and architectural improvements can overcome LLM limitations.¹¹⁰ Anthropic CEO Dario Amodei predicted we might have "a country of geniuses in a datacenter" as early as 2026, suggesting LLM-derived systems could achieve human-level capability.¹¹¹

The debate may prove empirical rather than philosophical. If world model companies produce systems that demonstrate reliable physical reasoning while LLMs continue hallucinating impossible physics, the field's center of gravity may shift permanently.¹¹²

Key Takeaways

For infrastructure planners: - Budget for video-scale compute requirements (8-32x LLM inference) - Prioritize high-bandwidth memory GPUs (H200, B200) for real-time inference - Plan petabyte-scale storage for video training data - Consider NVIDIA Cosmos integration for robotics/AV applications

For operations teams: - Evaluate world model APIs for synthetic data generation - Develop expertise in video processing pipelines - Monitor real-time inference latency requirements - Prepare infrastructure for multi-modal workloads

For strategic planning: - Track AMI Labs launch for production-ready world models - Assess Genie 3 research access opportunities - Evaluate Marble for creative pipeline integration - Consider world model capabilities in long-term AI roadmaps

For research teams: - Experiment with NVIDIA Cosmos for robotics applications - Monitor DeepMind publications on Genie 3 architecture - Evaluate I-JEPA approaches for visual understanding - Compare world model outputs against LLM baselines