Three billion dollars in pre-launch valuation for a startup that has not released a single product.1 Yann LeCun's AMI Labs represents the largest bet yet on a thesis that has divided AI researchers for years: large language models will never achieve general intelligence, and the path forward runs through world models instead.
TL;DR
The world models paradigm exploded into mainstream AI development in late 2025 and early 2026. Yann LeCun left Meta after 12 years to launch AMI Labs, raising €500M at a €3B valuation to build AI systems that understand physics rather than just predicting text.2 Google DeepMind released Genie 3, the first real-time interactive world model capable of generating persistent 3D environments at 24 fps.3 Fei-Fei Li's World Labs launched Marble, making world model generation commercially available with pricing from free to $95/month.4 NVIDIA's Cosmos platform has seen 2 million downloads as robotics and autonomous vehicle developers embrace synthetic physics-aware training data.5 For organizations building AI infrastructure, world models signal a computational shift from text processing toward video generation, physics simulation, and embodied reasoning.
The LLM Ceiling
Large language models achieved remarkable capabilities through scale. GPT-4, Claude, and Gemini demonstrate sophisticated reasoning, code generation, and multi-step problem solving.6 Yet a fundamental limitation persists: these models learn statistical patterns from text, not understanding of physical reality.7
Research published in 2024 proved mathematically that LLMs cannot learn all computable functions and will therefore inevitably hallucinate when used as general problem solvers.8 The root cause lies in how LLMs operate: predicting which tokens follow previous tokens based on patterns learned from training data, without any grounding in physical reality.9
The Hallucination Problem
LLMs generate plausible-sounding text that may describe physically impossible scenarios, historically inaccurate events, or logically inconsistent reasoning.10 Unlike humans who learn about gravity through embodied experience, LLMs only learn that the word "gravity" tends to appear near certain other words.11
| Limitation | Cause | Consequence |
|---|---|---|
| Factual hallucination | No verified knowledge base12 | Confident fabrication of facts |
| Physical reasoning failure | No embodied experience13 | Describes impossible physics |
| Causal confusion | Pattern matching, not understanding14 | Correlation treated as causation |
| Temporal incoherence | Sequential token prediction15 | Events in impossible order |
Yann LeCun has argued publicly for years that scaling LLMs will not produce general intelligence.16 "LLMs are too limiting," LeCun stated in his NVIDIA GTC presentation. "Scaling them up will not allow us to reach AGI."17
The alternative he proposes: world models that learn representations of physical reality, enabling prediction, planning, and reasoning about cause and effect.18
Yann LeCun's AMI Labs
LeCun departed Meta in December 2025 after 12 years, five as founding director of Facebook AI Research (FAIR) and seven as chief AI scientist.19 His new venture, Advanced Machine Intelligence (AMI) Labs, represents the most ambitious attempt yet to commercialize world model research.20
Funding and Structure
AMI Labs entered funding discussions seeking €500 million at a €3 billion valuation before launching any product.21 The target would represent one of the largest pre-launch raises in AI history, reflecting investor confidence in LeCun's vision and track record.22
| Role | Person | Background |
|---|---|---|
| Executive Chairman | Yann LeCun | Turing Award winner, Meta FAIR founder23 |
| CEO | Alex LeBrun | Former CEO of Nabla (medical AI)24 |
The company plans to establish headquarters in Paris by January 2026.25 While Meta will not invest directly in AMI Labs, the companies plan to forge a partnership allowing LeCun to continue research connections.26
Technical Vision
AMI Labs aims to create AI systems that understand physics, maintain persistent memory, and plan complex actions rather than simply predicting text sequences.27 LeCun describes a world model as "your mental model of how the world behaves."28
"You can imagine a sequence of actions you might take, and your world model will allow you to predict what the effect of the sequence of actions will be on the world," LeCun explained.29
The approach differs fundamentally from LLMs. Where GPT-style models predict the next word, world models predict the next state of a physical environment given actions taken within it.30 This enables:
- Planning: Simulating outcomes before taking action
- Reasoning about physics: Understanding that objects have mass, momentum, and spatial relationships
- Cause-effect understanding: Learning that actions produce predictable consequences
- Persistent memory: Maintaining consistent world state across time
I-JEPA Foundation
AMI Labs builds on LeCun's I-JEPA (Image Joint Embedding Predictive Architecture) research at Meta.31 I-JEPA learns by predicting representations of image regions from other regions, developing abstract understanding of visual scenes without needing explicit labels.32
The approach parallels how humans develop intuitive physics through observation. A child watching objects fall develops an internal model of gravity without anyone explaining Newton's laws.33 I-JEPA and successor architectures aim to replicate this learning process in artificial systems.34
DeepMind's Genie 3
Google DeepMind released Genie 3 in August 2025, representing the first real-time interactive general-purpose world model.35 Unlike previous systems that generated static environments or required significant processing time, Genie 3 produces navigable 3D worlds at 24 frames per second.36
Technical Capabilities
Genie 3 generates dynamic environments from text prompts, maintaining visual consistency for several minutes of real-time interaction.37 The system does not rely on hard-coded physics engines; instead, the model teaches itself how the world works through training.38
| Capability | Specification |
|---|---|
| Frame rate | 24 fps real-time39 |
| Resolution | 720p40 |
| Consistency duration | Several minutes41 |
| Memory horizon | Up to 1 minute lookback42 |
| Physics | Self-learned, not hard-coded43 |
"Genie 3 is the first real-time interactive general-purpose world model," stated Shlomi Fruchter, research director at DeepMind. "It goes beyond narrow world models that existed before. It's not specific to any particular environment."44
Auto-Regressive Architecture
The model generates one frame at a time, looking back at previously generated content to determine what happens next.45 Achieving real-time performance requires computing this auto-regressive process multiple times per second while maintaining consistency with potentially minute-old visual memory.46
Physical consistency emerges from training rather than explicit programming.47 Genie 3 environments maintain stable physics because the model learned physical regularities from training data, not because researchers manually encoded gravity or collision detection.48
AGI Implications
DeepMind positions Genie 3 as a stepping stone toward artificial general intelligence.49 The lab expects world model technology to play a critical role as AI agents interact more with physical environments.50
"Genie 3 marks a major leap toward Artificial General Intelligence by enabling AI agents to 'experience,' interact with, and learn from richly simulated worlds without manual content creation," according to DeepMind's announcement.51
Current Limitations
Genie 3 remains in limited research preview rather than public release.52 Known constraints include:
- Limited action space for agent interactions
- Consistency breakdown after several minutes
- Incomplete real-world geographic accuracy
- Challenges modeling complex multi-agent interactions
DeepMind continues expanding testing access to selected academics and creators.53
Fei-Fei Li's World Labs and Marble
World Labs, founded by AI pioneer Fei-Fei Li, launched Marble in November 2025 as the first commercially available world model product.54 The startup emerged from stealth with $230 million in funding just over a year before the Marble launch.55
Product Architecture
Marble generates persistent, downloadable 3D environments from text prompts, photos, videos, 3D layouts, or panoramic images.56 Unlike competitors that generate worlds on-the-fly during exploration, Marble produces discrete environments that users can edit and export.57
| Input Type | Output |
|---|---|
| Text prompt | 3D environment |
| Photo | 3D environment |
| Video | 3D environment |
| 3D layout | AI-enhanced 3D environment |
| Panorama | 3D environment |
The platform offers AI-native editing tools and a hybrid 3D editor enabling spatial structure blocking before AI fills visual details.58 Files export in formats compatible with industry-standard tools like Unreal Engine and Unity.59
Pricing Model
World Labs adopted a freemium structure targeting creative professionals:60
| Tier | Price | Generations | Features |
|---|---|---|---|
| Free | $0 | 4/month | Basic generation |
| Standard | $20/month | 12/month | Standard features |
| Pro | $35/month | 25/month | Commercial rights |
| Max | $95/month | 75/month | Premium features |
Target Applications
Initial use cases focus on gaming, visual effects for film, and virtual reality.61 Marble supports Vision Pro and Quest 3 VR headsets, with every generated world viewable in VR.62
Fei-Fei Li positions Marble as "the first step toward creating a truly spatially intelligent world model."63 Beyond creative applications, the technology enables robotics training through simulated environments that would be expensive or dangerous to create in physical reality.64
NVIDIA Cosmos: Industrial-Scale World Models
NVIDIA launched Cosmos at CES 2025 as a platform for physical AI development, specifically targeting autonomous vehicles and robotics.65 By January 2026, Cosmos world foundation models had been downloaded over 2 million times.66
Platform Architecture
Cosmos comprises generative world foundation models, advanced tokenizers, guardrails, and an accelerated video processing pipeline.67 The models predict and generate physics-aware videos of future environment states, enabling synthetic training data generation at massive scale.68
| Model Tier | Optimization | Use Case |
|---|---|---|
| Nano | Real-time, edge deployment69 | On-device inference |
| Super | High performance baseline70 | General development |
| Ultra | Maximum quality and fidelity71 | Custom model distillation |
The platform trained on 9,000 trillion tokens from 20 million hours of real-world data spanning human interactions, environments, industrial settings, robotics, and driving scenarios.72
Industry Adoption
Leading robotics and automotive companies adopted Cosmos for synthetic data generation:73
| Company | Domain |
|---|---|
| 1X | Humanoid robots |
| Agility | Bipedal robots |
| Figure AI | Humanoid robots |
| Waabi | Autonomous trucking |
| XPENG | Electric vehicles |
| Uber | Ridesharing autonomous |
Cosmos Model Types
Three model types address different physical AI development needs:74
Cosmos-Predict: Simulates and predicts future world states in video form Cosmos-Transfer: Produces high-quality simulations conditioned on spatial control inputs Cosmos-Reason: Reasoning model for physical AI development
NVIDIA released the reasoning model as open and fully customizable, enabling developers to generate diverse training data using text, image, and video prompts.75
Video Generation as World Simulation
The distinction between video generation and world models has blurred as leading video systems incorporate physics understanding. OpenAI describes Sora as teaching "AI to understand and simulate the physical world in motion."76
Sora 2 Progress
OpenAI released Sora 2 as a significant advancement in physical understanding.77 Where previous video models "morphed objects and deformed reality" to execute prompts, Sora 2 demonstrates physics compliance. A missed basketball shot rebounds off the backboard rather than teleporting to the hoop.78
"The model's 'mistakes' often appear to be mistakes of the internal agent being modeled," OpenAI noted, indicating the system simulates agents operating within physical constraints rather than generating arbitrary visual sequences.79
Runway's World Models Approach
Runway's Gen-4.5, released in December 2025, claimed the top position on the Video Arena benchmark, outperforming Google's Veo 3 and OpenAI's Sora 2 Pro.80 Runway explicitly frames Gen-4.5 as moving beyond "video generation" toward "world models that understand physics."81
"Objects move with realistic weight, momentum and force. Liquids flow with proper dynamics," Runway stated.82 The company positions Gen-4.5 as a step toward "General World Models" that simulate environments including their physics.83
Competitive Landscape
| Model | Company | Benchmark Position | Physics Focus |
|---|---|---|---|
| Gen-4.5 | Runway | #1 Video Arena84 | Explicit world model framing |
| Veo 3 | #2 Video Arena85 | Video generation with physics | |
| Sora 2 Pro | OpenAI | #7 Video Arena86 | World simulation research |
| Genie 3 | DeepMind | N/A (different focus)87 | Real-time interaction |
Applications Beyond Entertainment
World models address critical limitations in training embodied AI systems. Robotics and autonomous vehicles require understanding of physics that cannot be learned from text alone.88
Robotics Training
Physical robots benefit from training in simulated environments before deployment.89 World models generate diverse scenarios that would be impractical or dangerous to create in reality. A warehouse robot can experience millions of package-handling scenarios in simulation, including edge cases that rarely occur in physical warehouses.90
NVIDIA's Cosmos enables developers to "generate diverse data for training robots at scale using text, image and video prompts."91 This synthetic data addresses a fundamental challenge in robotics: unlike language models that can train on internet-scale text, robots have limited physical training data available.92
Autonomous Vehicles
Autonomous vehicle development requires exposure to scenarios that occur rarely in real driving but must be handled correctly when encountered.93 World models enable generation of:
- Near-miss collision scenarios
- Unusual weather conditions
- Pedestrian behaviors in edge cases
- Construction zone configurations
- Emergency vehicle interactions
World models serve as "learned simulators" or mental "what if" thought experiments for model-based reinforcement learning.94 By incorporating world models into driving systems, developers enable vehicles to understand human decisions and generalize to real-world situations.95
Scientific Simulation
World models promise impact beyond robotics and vehicles. Applications include:96
- Molecular structure simulation in chemistry
- Physical law modeling in physics
- Climate system prediction
- Medical procedure training
- Manufacturing process optimization
Organizations deploying AI infrastructure for world model development can consult Introl for GPU deployment strategies across 257 global locations with 100,000 GPU capability.
Infrastructure Requirements
World models demand different computational profiles than large language models. Video generation and physics simulation require substantially more compute per inference than text generation.97
GPU Requirements
World model training involves video data rather than text, dramatically increasing memory and compute requirements.98 A single high-quality video frame contains orders of magnitude more information than a text token. Training on 20 million hours of video, as NVIDIA's Cosmos did, requires infrastructure beyond what most organizations can deploy independently.99
| Workload | Typical GPU Requirement |
|---|---|
| LLM inference | 1-8 GPUs per request |
| World model inference | 8-32 GPUs per request |
| LLM training | Hundreds to thousands |
| World model training | Thousands to tens of thousands |
Memory Bandwidth
Real-time world model inference at 24 fps requires rapid memory access to maintain consistency with previously generated frames.100 High-bandwidth memory (HBM) GPUs like NVIDIA H200 and B200 offer advantages for workloads that must repeatedly access large visual context windows.101
Storage Considerations
Video training data consumes storage at rates far exceeding text corpora. A single hour of high-quality video may exceed 100GB uncompressed.102 Organizations building world model training infrastructure must plan for petabyte-scale storage with high-throughput access patterns.103
The AGI Debate
The world models approach represents a philosophical divergence from the scaling hypothesis that drove LLM development.104 Proponents argue that text prediction cannot produce genuine understanding, while critics question whether learned physics simulations will generalize to novel situations.105
The LeCun Position
LeCun argues that LLMs represent a dead end for AGI because they lack grounding in physical reality.106 Text-only training produces systems that can discuss physics without understanding physics, describe spatial relationships without perceiving space, and reason about causation without experiencing cause and effect.107
World models, by contrast, learn representations from sensory data and forecast dynamics like motion, force, and spatial relationships.108 This grounding potentially enables robust generalization that text-trained systems cannot achieve.109
The Scaling Counter-Argument
Some researchers maintain that sufficient scale and architectural improvements can overcome LLM limitations.110 Anthropic CEO Dario Amodei predicted we might have "a country of geniuses in a datacenter" as early as 2026, suggesting LLM-derived systems could achieve human-level capability.111
The debate may prove empirical rather than philosophical. If world model companies produce systems that demonstrate reliable physical reasoning while LLMs continue hallucinating impossible physics, the field's center of gravity may shift permanently.112
Key Takeaways
For infrastructure planners: - Budget for video-scale compute requirements (8-32x LLM inference) - Prioritize high-bandwidth memory GPUs (H200, B200) for real-time inference - Plan petabyte-scale storage for video training data - Consider NVIDIA Cosmos integration for robotics/AV applications
For operations teams: - Evaluate world model APIs for synthetic data generation - Develop expertise in video processing pipelines - Monitor real-time inference latency requirements - Prepare infrastructure for multi-modal workloads
For strategic planning: - Track AMI Labs launch for production-ready world models - Assess Genie 3 research access opportunities - Evaluate Marble for creative pipeline integration - Consider world model capabilities in long-term AI roadmaps
For research teams: - Experiment with NVIDIA Cosmos for robotics applications - Monitor DeepMind publications on Genie 3 architecture - Evaluate I-JEPA approaches for visual understanding - Compare world model outputs against LLM baselines
References
-
TechCrunch - Yann LeCun confirms his new 'world model' startup ↩
-
Google DeepMind - Genie 3: A new frontier for world models ↩
-
TechCrunch - Fei-Fei Li's World Labs speeds up the world model race with Marble ↩
-
NVIDIA Newsroom - Cosmos world foundation models downloaded 2 million times ↩
-
arXiv - Hallucination is Inevitable: An Innate Limitation of Large Language Models ↩
-
arXiv - A Survey on Hallucination in Large Language Models ↩
-
Medium - Understanding LLM Hallucination and Confabulation ↩
-
Futurism - Large Language Models Will Never Be Intelligent, Expert Says ↩
-
Fortune - Yann LeCun is targeting a $3.5 billion valuation ↩
-
LinkedIn News - AI pioneer Yann LeCun launches new startup ↩
-
AI Gopubby - Why Yann LeCun Bet $3.5 Billion on World Models Over LLMs ↩
-
Meta AI - I-JEPA: The first AI model based on Yann LeCun's vision ↩
-
TechCrunch - DeepMind thinks Genie 3 presents stepping stone towards AGI ↩
-
Marketing AI Institute - Google DeepMind's Genie 3 Virtual World Breakthrough ↩
-
TIME - Inside Fei-Fei Li's Plan to Build AI-Powered Virtual Worlds ↩
-
world-model-roadmap.github.io - Simulating the Visual World with AI ↩