CXL 4.0 and the Interconnect Wars: How AI Memory Is Reshaping Data Center Architecture
December 12, 2025
December 2025 Update: The CXL Consortium released CXL 4.0 on November 18, doubling bandwidth to 128 GT/s with PCIe 7.0 and introducing bundled ports for 1.5 TB/s connections. Panmnesia began sampling the industry's first CXL 3.2 fabric switch with port-based routing. Meanwhile, UALink eyes late 2026 deployment and Huawei open-sourced UB-Mesh as an alternative.
TL;DR
CXL 4.0 represents the next generation of memory interconnect technology, enabling 100+ terabytes of pooled memory with cache coherency across AI infrastructure. The specification's bundled ports feature allows aggregating multiple physical ports into single logical attachments delivering 1.5 TB/s total bandwidth. Panmnesia's CXL 3.2 fabric switch marks the first hardware implementing port-based routing for multi-rack AI clusters. The broader interconnect landscape fragments further as UALink, Ultra Ethernet, and Huawei's UB-Mesh compete for different niches.
What Happened
The CXL Consortium released the Compute Express Link 4.0 specification on November 18, 2025, at SC25.1 The specification shifts from PCIe 6.x (64 GT/s) to PCIe 7.0 (128 GT/s), doubling available bandwidth while maintaining the 256-byte FLIT format introduced with CXL 3.x.2
"The release of the CXL 4.0 specification sets a new milestone for advancing coherent memory connectivity, doubling the bandwidth over the previous generation with powerful new features," stated Derek Rohde, CXL Consortium President and Principal Engineer at NVIDIA.3
Four days earlier, on November 12, Korean startup Panmnesia announced sample availability of its PCIe 6.0/CXL 3.2 Fabric Switch: the first silicon implementing port-based routing (PBR) for CXL fabrics.4
The interconnect landscape continues fragmenting. UALink targets late 2026 data center deployment. Huawei announced it will open-source its UB-Mesh protocol, designed to replace PCIe, CXL, NVLink, and TCP/IP with a unified standard.5
Why It Matters for Infrastructure
Memory Becomes Composable: CXL 4.0 enables memory pooling at scale. AI inference workloads requiring hundreds of terabytes can now access shared memory pools across racks with cache coherency, not just within a single server.
Bandwidth Matches AI Demand: A CXL 4.0 bundled port with x16 links at 128 GT/s delivers 768 GB/s in each direction (1.536 TB/s total bandwidth between device and CPU).6 LLM inference serving benefits directly from this capacity.
Multi-Rack AI Clusters: The port-based routing in CXL 3.2/4.0 allows fabric switches to interconnect thousands of devices across multiple racks without incurring long network latency. Panmnesia claims "double-digit nanosecond latency" for memory access.7
Standards Fragmentation Risk: Four competing interconnect ecosystems (CXL/PCIe, UALink, Ultra Ethernet, NVLink) force infrastructure planners to bet on winners. Equipment purchased today may face interoperability challenges in 2027.
Technical Details
CXL 4.0 Specification
| Feature | CXL 3.x | CXL 4.0 |
|---|---|---|
| Base Protocol | PCIe 6.x | PCIe 7.0 |
| Transfer Speed | 64 GT/s | 128 GT/s |
| FLIT Size | 256B | 256B |
| Retimers Supported | 2 | 4 |
| Link Width Options | Standard | Native x2 added |
| Bundled Ports | No | Yes |
Bundled Ports Architecture
CXL 4.0's bundled ports aggregate multiple physical CXL device ports into a single logical entity:8
- Host and Type 1/2 device can combine multiple physical ports
- System software sees single device despite multiple physical connections
- Optimized for 256B Flit Mode, eliminating legacy 68B Flit overhead
- Enables 1.5+ TB/s total bandwidth per logical connection
Panmnesia CXL 3.2 Fabric Switch
The first CXL 3.2 switch silicon includes:9
| Specification | Detail |
|---|---|
| Protocol Support | PCIe Gen 6.0 + CXL 3.2 hybrid |
| Data Rate | 64 GT/s |
| Routing Modes | PBR (port-based) and HBR (hierarchy-based) |
| CXL Subprotocols | CXL.cache, CXL.mem, CXL.io |
| Lane Count | 256-lane high fan-out |
| Latency | Double-digit nanoseconds |
| Backward Compatibility | All previous PCIe/CXL generations |
Target applications include DLRM (Deep Learning Recommendation Models), LLM inference, RAG workloads, and MPI-based HPC simulations.
Competing Interconnect Standards
| Standard | Owner | Purpose | Bandwidth | Scale | Timeline |
|---|---|---|---|---|---|
| CXL 4.0 | Consortium | Memory coherency | 128 GT/s | Multi-rack | Late 2026-2027 |
| NVLink 5 | NVIDIA | GPU-GPU | 1.8 TB/s | 576 GPUs | Available |
| UALink 1.0 | AMD-led consortium | Accelerator-accelerator | 200 Gb/s/lane | 1,024 devices | Late 2026 |
| Ultra Ethernet | UEC | Scale-out networking | Ethernet-based | 10,000s endpoints | 2026+ |
| UB-Mesh | Huawei | Unified interconnect | 1+ TB/s/device | 1M processors | Open-sourced |
Interconnect Decision Framework
When to use which standard:
| Use Case | Best Fit | Why |
|---|---|---|
| GPU-to-GPU within node | NVLink | Highest bandwidth (1.8 TB/s), lowest latency |
| GPU-to-GPU across nodes | UALink | Open standard alternative to NVLink |
| Memory expansion | CXL | Cache coherency with CPU, memory pooling |
| Scale-out networking | Ultra Ethernet / InfiniBand | Designed for 10,000+ endpoint clusters |
| Unified China ecosystem | UB-Mesh | Avoids Western IP restrictions |
UALink vs. CXL Positioning
UALink does not compete directly with CXL. They serve different purposes:10
- UALink: GPU-to-GPU scaling for accelerator clusters (scale-up)
- CXL: CPU-memory coherency and memory pooling (memory expansion)
- Ultra Ethernet: Scale-out networking across data centers
"UALink works alongside PCIe and CXL, but only UALink has the effect of unifying the allocated resources. UALink is designed to connect your main GPU units for GPU-to-GPU scaling," explained Michael Posner, VP of Product Management at Synopsys.11
Huawei UB-Mesh
Huawei's alternative approach aims to replace all existing interconnects:12
- Targets 1 TB/s+ bandwidth per device
- ~150 ns hop latency (microseconds to nanoseconds improvement)
- Synchronous load/store semantics vs. packet-based
- Open-source license announced September 2025
- Scales to 1 million processors in "SuperNode" architecture
Industry adoption remains uncertain given geopolitical concerns and existing standards momentum.
What's Next
Late 2026: UALink switches reach data centers; CXL 4.0 products begin sampling.
Late 2026-2027: CXL 4.0 multi-rack systems reach production deployment.13
Q4 2026: Upscale AI targets UALink switch delivery.14
Ongoing: Standards bodies navigate coexistence of CXL, UALink, and Ultra Ethernet. Huawei's UB-Mesh seeks adoption outside Western markets.
The interconnect landscape will remain fragmented through at least 2027. No single standard addresses all use cases: memory pooling (CXL), accelerator scaling (UALink/NVLink), and network fabric (Ultra Ethernet/InfiniBand).
Key Takeaways
For infrastructure planners: - CXL 4.0 enables 100+ TB memory pools with cache coherency across racks - Panmnesia sampling first CXL 3.2 fabric switch with port-based routing - Plan for standards coexistence: CXL + UALink + Ultra Ethernet/InfiniBand - Late 2026-2027 deployment timeline for CXL 4.0 production systems
For operations teams: - CXL maintains backward compatibility with previous generations - Port-based routing simplifies multi-rack fabric management - Double-digit nanosecond latency for memory access across switches - Monitor Panmnesia, XConn, and other CXL switch vendors for availability
For strategic planning: - No single interconnect standard will "win" because different layers serve different purposes - Memory pooling becomes viable for AI inference at scale - Huawei's UB-Mesh creates parallel ecosystem primarily for China market - Equipment decisions in 2025-2026 will affect interoperability through 2030
References
For AI infrastructure deployment with advanced interconnect architectures, contact Introl.
-
CXL Consortium. "CXL Consortium Releases the Compute Express Link 4.0 Specification." November 18, 2025. ↩
-
VideoCardz. "CXL 4.0 spec moves to PCIe 7.0, doubles bandwidth over CXL 3.0." November 2025. ↩
-
Business Wire. "CXL Consortium Releases the Compute Express Link 4.0 Specification Increasing Speed and Bandwidth." November 18, 2025. ↩
-
Business Wire. "Panmnesia Announces Sample Availability of PCIe 6.0/CXL 3.2 Fabric Switch." November 12, 2025. ↩
-
Tom's Hardware. "Huawei to open-source its UB-Mesh data center-scale interconnect soon." August 2025. ↩
-
Datacenter.news. "CXL 4.0 doubles bandwidth, introduces bundled ports for data centres." November 2025. ↩
-
Panmnesia. "Press Release: PCIe 6.0/CXL 3.2 Fabric Switch." November 2025. ↩
-
Blocks and Files. "CXL 4.0 doubles bandwidth and stretches memory pooling to multi-rack setups." November 24, 2025. ↩
-
TechPowerUp. "Panmnesia Samples Industry's First PCIe 6.0/CXL 3.2 Fabric Switch." November 2025. ↩
-
Semi Engineering. "New Data Center Protocols Tackle AI." 2025. ↩
-
Synopsys. "Ultra Ethernet UaLink AI Networks." 2025. ↩
-
ServeTheHome. "Huawei Presents UB-Mesh Interconnect for Large AI SuperNodes at Hot Chips 2025." August 2025. ↩
-
Blocks and Files. "CXL 4.0 doubles bandwidth." November 2025. ↩
-
HPCwire. "Upscale AI Eyes Late 2026 for Scale-Up UALink Switch." December 2, 2025. ↩
-
EE Times. "CXL Adds Port Bundling to Quench AI Thirst." November 2025. ↩
-
SDxCentral. "Compute Express Link Consortium debuts 4.0 spec to push past bandwidth bottlenecks." November 2025. ↩
-
CXL Consortium. "CXL 4.0 White Paper." November 2025. ↩