World Models Race 2026: How LeCun, DeepMind, and World Labs Are Redefining the Path to AGI

Yann LeCun raises €500M for AMI Labs while DeepMind's Genie 3 simulates real-time 3D worlds. The 2026 race to build AI that understands physics may eclipse LLMs.

World Models Race 2026: How LeCun, DeepMind, and World Labs Are Redefining the Path to AGI

Three billion dollars in pre-launch valuation for a startup that has not released a single product.1 Yann LeCun's AMI Labs represents the largest bet yet on a thesis that has divided AI researchers for years: large language models will never achieve general intelligence, and the path forward runs through world models instead.

TL;DR

The world models paradigm exploded into mainstream AI development in late 2025 and early 2026. Yann LeCun left Meta after 12 years to launch AMI Labs, raising €500M at a €3B valuation to build AI systems that understand physics rather than just predicting text.2 Google DeepMind released Genie 3, the first real-time interactive world model capable of generating persistent 3D environments at 24 fps.3 Fei-Fei Li's World Labs launched Marble, making world model generation commercially available with pricing from free to $95/month.4 NVIDIA's Cosmos platform has seen 2 million downloads as robotics and autonomous vehicle developers embrace synthetic physics-aware training data.5 For organizations building AI infrastructure, world models signal a computational shift from text processing toward video generation, physics simulation, and embodied reasoning.

The LLM Ceiling

Large language models achieved remarkable capabilities through scale. GPT-4, Claude, and Gemini demonstrate sophisticated reasoning, code generation, and multi-step problem solving.6 Yet a fundamental limitation persists: these models learn statistical patterns from text, not understanding of physical reality.7

Research published in 2024 proved mathematically that LLMs cannot learn all computable functions and will therefore inevitably hallucinate when used as general problem solvers.8 The root cause lies in how LLMs operate: predicting which tokens follow previous tokens based on patterns learned from training data, without any grounding in physical reality.9

The Hallucination Problem

LLMs generate plausible-sounding text that may describe physically impossible scenarios, historically inaccurate events, or logically inconsistent reasoning.10 Unlike humans who learn about gravity through embodied experience, LLMs only learn that the word "gravity" tends to appear near certain other words.11

Limitation Cause Consequence
Factual hallucination No verified knowledge base12 Confident fabrication of facts
Physical reasoning failure No embodied experience13 Describes impossible physics
Causal confusion Pattern matching, not understanding14 Correlation treated as causation
Temporal incoherence Sequential token prediction15 Events in impossible order

Yann LeCun has argued publicly for years that scaling LLMs will not produce general intelligence.16 "LLMs are too limiting," LeCun stated in his NVIDIA GTC presentation. "Scaling them up will not allow us to reach AGI."17

The alternative he proposes: world models that learn representations of physical reality, enabling prediction, planning, and reasoning about cause and effect.18

Yann LeCun's AMI Labs

LeCun departed Meta in December 2025 after 12 years, five as founding director of Facebook AI Research (FAIR) and seven as chief AI scientist.19 His new venture, Advanced Machine Intelligence (AMI) Labs, represents the most ambitious attempt yet to commercialize world model research.20

Funding and Structure

AMI Labs entered funding discussions seeking €500 million at a €3 billion valuation before launching any product.21 The target would represent one of the largest pre-launch raises in AI history, reflecting investor confidence in LeCun's vision and track record.22

Role Person Background
Executive Chairman Yann LeCun Turing Award winner, Meta FAIR founder23
CEO Alex LeBrun Former CEO of Nabla (medical AI)24

The company plans to establish headquarters in Paris by January 2026.25 While Meta will not invest directly in AMI Labs, the companies plan to forge a partnership allowing LeCun to continue research connections.26

Technical Vision

AMI Labs aims to create AI systems that understand physics, maintain persistent memory, and plan complex actions rather than simply predicting text sequences.27 LeCun describes a world model as "your mental model of how the world behaves."28

"You can imagine a sequence of actions you might take, and your world model will allow you to predict what the effect of the sequence of actions will be on the world," LeCun explained.29

The approach differs fundamentally from LLMs. Where GPT-style models predict the next word, world models predict the next state of a physical environment given actions taken within it.30 This enables:

  • Planning: Simulating outcomes before taking action
  • Reasoning about physics: Understanding that objects have mass, momentum, and spatial relationships
  • Cause-effect understanding: Learning that actions produce predictable consequences
  • Persistent memory: Maintaining consistent world state across time

I-JEPA Foundation

AMI Labs builds on LeCun's I-JEPA (Image Joint Embedding Predictive Architecture) research at Meta.31 I-JEPA learns by predicting representations of image regions from other regions, developing abstract understanding of visual scenes without needing explicit labels.32

The approach parallels how humans develop intuitive physics through observation. A child watching objects fall develops an internal model of gravity without anyone explaining Newton's laws.33 I-JEPA and successor architectures aim to replicate this learning process in artificial systems.34

DeepMind's Genie 3

Google DeepMind released Genie 3 in August 2025, representing the first real-time interactive general-purpose world model.35 Unlike previous systems that generated static environments or required significant processing time, Genie 3 produces navigable 3D worlds at 24 frames per second.36

Technical Capabilities

Genie 3 generates dynamic environments from text prompts, maintaining visual consistency for several minutes of real-time interaction.37 The system does not rely on hard-coded physics engines; instead, the model teaches itself how the world works through training.38

Capability Specification
Frame rate 24 fps real-time39
Resolution 720p40
Consistency duration Several minutes41
Memory horizon Up to 1 minute lookback42
Physics Self-learned, not hard-coded43

"Genie 3 is the first real-time interactive general-purpose world model," stated Shlomi Fruchter, research director at DeepMind. "It goes beyond narrow world models that existed before. It's not specific to any particular environment."44

Auto-Regressive Architecture

The model generates one frame at a time, looking back at previously generated content to determine what happens next.45 Achieving real-time performance requires computing this auto-regressive process multiple times per second while maintaining consistency with potentially minute-old visual memory.46

Physical consistency emerges from training rather than explicit programming.47 Genie 3 environments maintain stable physics because the model learned physical regularities from training data, not because researchers manually encoded gravity or collision detection.48

AGI Implications

DeepMind positions Genie 3 as a stepping stone toward artificial general intelligence.49 The lab expects world model technology to play a critical role as AI agents interact more with physical environments.50

"Genie 3 marks a major leap toward Artificial General Intelligence by enabling AI agents to 'experience,' interact with, and learn from richly simulated worlds without manual content creation," according to DeepMind's announcement.51

Current Limitations

Genie 3 remains in limited research preview rather than public release.52 Known constraints include:

  • Limited action space for agent interactions
  • Consistency breakdown after several minutes
  • Incomplete real-world geographic accuracy
  • Challenges modeling complex multi-agent interactions

DeepMind continues expanding testing access to selected academics and creators.53

Fei-Fei Li's World Labs and Marble

World Labs, founded by AI pioneer Fei-Fei Li, launched Marble in November 2025 as the first commercially available world model product.54 The startup emerged from stealth with $230 million in funding just over a year before the Marble launch.55

Product Architecture

Marble generates persistent, downloadable 3D environments from text prompts, photos, videos, 3D layouts, or panoramic images.56 Unlike competitors that generate worlds on-the-fly during exploration, Marble produces discrete environments that users can edit and export.57

Input Type Output
Text prompt 3D environment
Photo 3D environment
Video 3D environment
3D layout AI-enhanced 3D environment
Panorama 3D environment

The platform offers AI-native editing tools and a hybrid 3D editor enabling spatial structure blocking before AI fills visual details.58 Files export in formats compatible with industry-standard tools like Unreal Engine and Unity.59

Pricing Model

World Labs adopted a freemium structure targeting creative professionals:60

Tier Price Generations Features
Free $0 4/month Basic generation
Standard $20/month 12/month Standard features
Pro $35/month 25/month Commercial rights
Max $95/month 75/month Premium features

Target Applications

Initial use cases focus on gaming, visual effects for film, and virtual reality.61 Marble supports Vision Pro and Quest 3 VR headsets, with every generated world viewable in VR.62

Fei-Fei Li positions Marble as "the first step toward creating a truly spatially intelligent world model."63 Beyond creative applications, the technology enables robotics training through simulated environments that would be expensive or dangerous to create in physical reality.64

NVIDIA Cosmos: Industrial-Scale World Models

NVIDIA launched Cosmos at CES 2025 as a platform for physical AI development, specifically targeting autonomous vehicles and robotics.65 By January 2026, Cosmos world foundation models had been downloaded over 2 million times.66

Platform Architecture

Cosmos comprises generative world foundation models, advanced tokenizers, guardrails, and an accelerated video processing pipeline.67 The models predict and generate physics-aware videos of future environment states, enabling synthetic training data generation at massive scale.68

Model Tier Optimization Use Case
Nano Real-time, edge deployment69 On-device inference
Super High performance baseline70 General development
Ultra Maximum quality and fidelity71 Custom model distillation

The platform trained on 9,000 trillion tokens from 20 million hours of real-world data spanning human interactions, environments, industrial settings, robotics, and driving scenarios.72

Industry Adoption

Leading robotics and automotive companies adopted Cosmos for synthetic data generation:73

Company Domain
1X Humanoid robots
Agility Bipedal robots
Figure AI Humanoid robots
Waabi Autonomous trucking
XPENG Electric vehicles
Uber Ridesharing autonomous

Cosmos Model Types

Three model types address different physical AI development needs:74

Cosmos-Predict: Simulates and predicts future world states in video form Cosmos-Transfer: Produces high-quality simulations conditioned on spatial control inputs Cosmos-Reason: Reasoning model for physical AI development

NVIDIA released the reasoning model as open and fully customizable, enabling developers to generate diverse training data using text, image, and video prompts.75

Video Generation as World Simulation

The distinction between video generation and world models has blurred as leading video systems incorporate physics understanding. OpenAI describes Sora as teaching "AI to understand and simulate the physical world in motion."76

Sora 2 Progress

OpenAI released Sora 2 as a significant advancement in physical understanding.77 Where previous video models "morphed objects and deformed reality" to execute prompts, Sora 2 demonstrates physics compliance. A missed basketball shot rebounds off the backboard rather than teleporting to the hoop.78

"The model's 'mistakes' often appear to be mistakes of the internal agent being modeled," OpenAI noted, indicating the system simulates agents operating within physical constraints rather than generating arbitrary visual sequences.79

Runway's World Models Approach

Runway's Gen-4.5, released in December 2025, claimed the top position on the Video Arena benchmark, outperforming Google's Veo 3 and OpenAI's Sora 2 Pro.80 Runway explicitly frames Gen-4.5 as moving beyond "video generation" toward "world models that understand physics."81

"Objects move with realistic weight, momentum and force. Liquids flow with proper dynamics," Runway stated.82 The company positions Gen-4.5 as a step toward "General World Models" that simulate environments including their physics.83

Competitive Landscape

Model Company Benchmark Position Physics Focus
Gen-4.5 Runway #1 Video Arena84 Explicit world model framing
Veo 3 Google #2 Video Arena85 Video generation with physics
Sora 2 Pro OpenAI #7 Video Arena86 World simulation research
Genie 3 DeepMind N/A (different focus)87 Real-time interaction

Applications Beyond Entertainment

World models address critical limitations in training embodied AI systems. Robotics and autonomous vehicles require understanding of physics that cannot be learned from text alone.88

Robotics Training

Physical robots benefit from training in simulated environments before deployment.89 World models generate diverse scenarios that would be impractical or dangerous to create in reality. A warehouse robot can experience millions of package-handling scenarios in simulation, including edge cases that rarely occur in physical warehouses.90

NVIDIA's Cosmos enables developers to "generate diverse data for training robots at scale using text, image and video prompts."91 This synthetic data addresses a fundamental challenge in robotics: unlike language models that can train on internet-scale text, robots have limited physical training data available.92

Autonomous Vehicles

Autonomous vehicle development requires exposure to scenarios that occur rarely in real driving but must be handled correctly when encountered.93 World models enable generation of:

  • Near-miss collision scenarios
  • Unusual weather conditions
  • Pedestrian behaviors in edge cases
  • Construction zone configurations
  • Emergency vehicle interactions

World models serve as "learned simulators" or mental "what if" thought experiments for model-based reinforcement learning.94 By incorporating world models into driving systems, developers enable vehicles to understand human decisions and generalize to real-world situations.95

Scientific Simulation

World models promise impact beyond robotics and vehicles. Applications include:96

  • Molecular structure simulation in chemistry
  • Physical law modeling in physics
  • Climate system prediction
  • Medical procedure training
  • Manufacturing process optimization

Organizations deploying AI infrastructure for world model development can consult Introl for GPU deployment strategies across 257 global locations with 100,000 GPU capability.

Infrastructure Requirements

World models demand different computational profiles than large language models. Video generation and physics simulation require substantially more compute per inference than text generation.97

GPU Requirements

World model training involves video data rather than text, dramatically increasing memory and compute requirements.98 A single high-quality video frame contains orders of magnitude more information than a text token. Training on 20 million hours of video, as NVIDIA's Cosmos did, requires infrastructure beyond what most organizations can deploy independently.99

Workload Typical GPU Requirement
LLM inference 1-8 GPUs per request
World model inference 8-32 GPUs per request
LLM training Hundreds to thousands
World model training Thousands to tens of thousands

Memory Bandwidth

Real-time world model inference at 24 fps requires rapid memory access to maintain consistency with previously generated frames.100 High-bandwidth memory (HBM) GPUs like NVIDIA H200 and B200 offer advantages for workloads that must repeatedly access large visual context windows.101

Storage Considerations

Video training data consumes storage at rates far exceeding text corpora. A single hour of high-quality video may exceed 100GB uncompressed.102 Organizations building world model training infrastructure must plan for petabyte-scale storage with high-throughput access patterns.103

The AGI Debate

The world models approach represents a philosophical divergence from the scaling hypothesis that drove LLM development.104 Proponents argue that text prediction cannot produce genuine understanding, while critics question whether learned physics simulations will generalize to novel situations.105

The LeCun Position

LeCun argues that LLMs represent a dead end for AGI because they lack grounding in physical reality.106 Text-only training produces systems that can discuss physics without understanding physics, describe spatial relationships without perceiving space, and reason about causation without experiencing cause and effect.107

World models, by contrast, learn representations from sensory data and forecast dynamics like motion, force, and spatial relationships.108 This grounding potentially enables robust generalization that text-trained systems cannot achieve.109

The Scaling Counter-Argument

Some researchers maintain that sufficient scale and architectural improvements can overcome LLM limitations.110 Anthropic CEO Dario Amodei predicted we might have "a country of geniuses in a datacenter" as early as 2026, suggesting LLM-derived systems could achieve human-level capability.111

The debate may prove empirical rather than philosophical. If world model companies produce systems that demonstrate reliable physical reasoning while LLMs continue hallucinating impossible physics, the field's center of gravity may shift permanently.112

Key Takeaways

For infrastructure planners: - Budget for video-scale compute requirements (8-32x LLM inference) - Prioritize high-bandwidth memory GPUs (H200, B200) for real-time inference - Plan petabyte-scale storage for video training data - Consider NVIDIA Cosmos integration for robotics/AV applications

For operations teams: - Evaluate world model APIs for synthetic data generation - Develop expertise in video processing pipelines - Monitor real-time inference latency requirements - Prepare infrastructure for multi-modal workloads

For strategic planning: - Track AMI Labs launch for production-ready world models - Assess Genie 3 research access opportunities - Evaluate Marble for creative pipeline integration - Consider world model capabilities in long-term AI roadmaps

For research teams: - Experiment with NVIDIA Cosmos for robotics applications - Monitor DeepMind publications on Genie 3 architecture - Evaluate I-JEPA approaches for visual understanding - Compare world model outputs against LLM baselines

References


  1. TechCrunch - Yann LeCun confirms his new 'world model' startup 

  2. Sifted - Yann LeCun raising €500m at €3bn valuation 

  3. Google DeepMind - Genie 3: A new frontier for world models 

  4. TechCrunch - Fei-Fei Li's World Labs speeds up the world model race with Marble 

  5. NVIDIA Newsroom - Cosmos world foundation models downloaded 2 million times 

  6. Sebastian Raschka - The State Of LLMs 2025 

  7. HSToday - The Vast World Beyond Large Language Models 

  8. arXiv - Hallucination is Inevitable: An Innate Limitation of Large Language Models 

  9. Nexla - LLM Hallucination—Types, Causes, and Solutions 

  10. arXiv - A Survey on Hallucination in Large Language Models 

  11. Medium - Understanding LLM Hallucination and Confabulation 

  12. Iguazio - What are LLM Hallucinations? 

  13. ACM - Integration of LLMs and the Physical World 

  14. Medium - LLM Hallucinations Explained 

  15. arXiv - LLMs Will Always Hallucinate 

  16. Futurism - Large Language Models Will Never Be Intelligent, Expert Says 

  17. Medium - Debunking the LLM-to-AGI Misconception 

  18. Forrester - LLMs, Make Room For World Models 

  19. Fortune - Yann LeCun is targeting a $3.5 billion valuation 

  20. LinkedIn News - AI pioneer Yann LeCun launches new startup 

  21. Sifted - AMI Labs funding details 

  22. TechCrunch - Pre-launch funding scale 

  23. Fortune - LeCun background 

  24. TechCrunch - Alex LeBrun CEO appointment 

  25. mlq.ai - AMI Labs Paris headquarters 

  26. Fortune - Meta partnership 

  27. AI Gopubby - Why Yann LeCun Bet $3.5 Billion on World Models Over LLMs 

  28. TechCrunch - LeCun world model definition 

  29. Fortune - World model vision 

  30. Forrester - World models vs LLMs 

  31. Meta AI - I-JEPA: The first AI model based on Yann LeCun's vision 

  32. Meta AI - I-JEPA technical approach 

  33. AI Gopubby - Embodied learning comparison 

  34. Meta AI - I-JEPA learning process 

  35. TechCrunch - DeepMind thinks Genie 3 presents stepping stone towards AGI 

  36. Google DeepMind - Genie 3 real-time capability 

  37. Marketing AI Institute - Google DeepMind's Genie 3 Virtual World Breakthrough 

  38. Google DeepMind - Self-learned physics 

  39. genie3.net - Frame rate specification 

  40. Google DeepMind - Resolution specification 

  41. TechCrunch - Consistency duration 

  42. Google DeepMind - Memory horizon 

  43. OpenCV - Genie 3 self-learned physics 

  44. TechCrunch - Shlomi Fruchter quote 

  45. Google DeepMind - Auto-regressive architecture 

  46. genie3.world - Computational requirements 

  47. Codecademy - Emergent physics 

  48. Google DeepMind - Training approach 

  49. TechCrunch - AGI positioning 

  50. Marketing AI Institute - AGI implications 

  51. genie3.net - AGI significance 

  52. Google DeepMind - Research preview status 

  53. TechCrunch - Access expansion 

  54. TechCrunch - World Labs Marble launch 

  55. World Labs - Company background 

  56. World Labs - Marble inputs 

  57. BD Tech Talks - Marble differentiation 

  58. TechCrunch - Editing tools 

  59. Fast Company - Export compatibility 

  60. World Labs - Marble pricing 

  61. TIME - Inside Fei-Fei Li's Plan to Build AI-Powered Virtual Worlds 

  62. Analytics India Mag - VR support 

  63. TechCrunch - Li quote 

  64. BD Tech Talks - Robotics applications 

  65. NVIDIA Newsroom - Cosmos launch 

  66. NVIDIA Newsroom - Download statistics 

  67. NVIDIA - Cosmos platform 

  68. NVIDIA Technical Blog - WFM capabilities 

  69. NVIDIA - Nano tier 

  70. NVIDIA - Super tier 

  71. NVIDIA - Ultra tier 

  72. NVIDIA Newsroom - Training data scale 

  73. NVIDIA Newsroom - Industry adoption 

  74. arXiv - Cosmos model types 

  75. NVIDIA Blogs - Cosmos open model 

  76. OpenAI - Video generation models as world simulators 

  77. OpenAI - Sora 2 is here 

  78. OpenAI - Sora 2 physics compliance 

  79. OpenAI - Agent simulation 

  80. CNBC - Runway Gen-4.5 benchmark 

  81. WinBuzzer - Runway world models approach 

  82. CNBC - Gen-4.5 physics 

  83. WinBuzzer - General World Models goal 

  84. CNBC - Gen-4.5 ranking 

  85. CNBC - Veo 3 ranking 

  86. CNBC - Sora 2 Pro ranking 

  87. Google DeepMind - Genie 3 focus 

  88. NVIDIA - World models for physical AI 

  89. GitHub - Awesome-World-Models robotics applications 

  90. Research.aimultiple - World Foundation Models use cases 

  91. NVIDIA Blogs - Synthetic data generation 

  92. BD Tech Talks - Robotics data challenge 

  93. arXiv - Survey of World Models for Autonomous Driving 

  94. China Daily - World models for autonomy 

  95. arXiv - Driving model generalization 

  96. world-model-roadmap.github.io - Simulating the Visual World with AI 

  97. NVIDIA - World model compute requirements 

  98. NVIDIA Technical Blog - Video training requirements 

  99. NVIDIA Newsroom - Training data scale 

  100. Google DeepMind - Real-time requirements 

  101. NVIDIA - HBM advantages 

  102. NVIDIA Research - Video data scale 

  103. NVIDIA - Storage considerations 

  104. TechCrunch - 2026 AI pragmatism 

  105. AI Frontiers - AGI's Last Bottlenecks 

  106. Futurism - LeCun AGI position 

  107. Substack - Will AGI Emerge from Large Language Models? 

  108. NVIDIA - World model representations 

  109. Forrester - World model generalization 

  110. arXiv - Large language models for AGI survey 

  111. TechCrunch - Amodei AGI prediction 

  112. Vontobel - 2026 Large Language Models Outlook 

Request a Quote_

Tell us about your project and we'll respond within 72 hours.

> TRANSMISSION_COMPLETE

Request Received_

Thank you for your inquiry. Our team will review your request and respond within 72 hours.

QUEUED FOR PROCESSING