Back to Blog

UALink and CXL 4.0: The Open Standards Reshaping GPU Cluster Architecture

UALink 1.0 challenges NVLink with 1,024-GPU scaling. CXL 4.0 doubles bandwidth to 128 GT/s. Technical guide to open interconnect standards for AI infrastructure.

UALink and CXL 4.0: The Open Standards Reshaping GPU Cluster Architecture

UALink and CXL 4.0: The Open Standards Reshaping GPU Cluster Architecture

The UALink 1.0 specification published in April 2025 enables scaling to 1,024 accelerators across a single fabric, directly challenging Nvidia's proprietary NVLink and NVSwitch ecosystem. Seven months later, the CXL Consortium released CXL 4.0 on November 18, 2025, doubling bandwidth to 128 GT/s and enabling multi-rack memory pooling. Together, these open standards represent the most significant challenge to Nvidia's interconnect dominance since the company introduced NVLink in 2016.

TL;DR

UALink 1.0 delivers 200 GT/s per lane with support for up to 1,024 accelerators, compared to NVLink's 576-GPU maximum. CXL 4.0 doubles memory bandwidth to 128 GT/s and introduces bundled ports for AI workloads requiring terabyte-scale shared memory. Hardware supporting UALink arrives in late 2026 from AMD, Intel, and Astera Labs, while CXL 4.0 multi-rack deployments target 2027. For infrastructure teams planning next-generation GPU clusters, these specifications signal a shift toward vendor-neutral architectures that reduce lock-in while enabling unprecedented scale.


The Interconnect Landscape in 2025

GPU interconnects determine how effectively AI clusters scale. The faster accelerators can exchange data, the larger the models they can train and the more efficiently they can serve inference requests.

Current Interconnect Technologies

Technology Owner Bandwidth Max Scale Status
NVLink 5.0 Nvidia 1.8 TB/s per GPU 576 GPUs Production (Blackwell)
NVLink 4.0 Nvidia 900 GB/s per GPU 256 GPUs Production (Hopper)
Infinity Fabric AMD ~1.075 TB/s per card 8 GPUs (direct mesh) Production (MI300X)
UALink 1.0 Consortium 800 GB/s (4 lanes) 1,024 accelerators Spec published April 2025
CXL 4.0 Consortium 128 GT/s Multi-rack Spec published Nov 2025

Nvidia's NVLink dominates production deployments, but the GB200 NVL72 system exemplifies both its power and its constraints: 72 Blackwell GPUs interconnected with 130 TB/s of aggregate bandwidth, but exclusively within Nvidia's proprietary ecosystem.


Consortium Formation

The Ultra Accelerator Link Consortium incorporated in October 2024 with founding members AMD, Astera Labs, AWS, Cisco, Google, HPE, Intel, Meta, and Microsoft. The effort builds on work AMD and Broadcom announced in December 2023.

By January 2025, Alibaba Cloud, Apple, and Synopsys joined at board level, bringing total membership to 75 organizations.

Technical Specifications

The UALink 200G 1.0 Specification defines a low-latency, high-bandwidth interconnect for communication between accelerators and switches in AI computing pods.

Specification UALink 1.0
Per-Lane Data Rate 200 GT/s bidirectional
Signaling Rate 212.5 GT/s (with FEC overhead)
Link Widths x1, x2, x4
Maximum Bandwidth 800 GB/s (x4 config)
Maximum Scale 1,024 accelerators
Cable Length <4 meters optimized
Latency Target <1 µs round-trip (64B/640B payloads)

UALink switches assign one port per accelerator and use 10-bit unique identifiers for precise routing across the fabric.

Metric UALink 1.0 NVLink 4.0 (Hopper) NVLink 5.0 (Blackwell)
Per-GPU Bandwidth 800 GB/s 900 GB/s 1.8 TB/s
Links per GPU 4 18 18
Maximum GPUs 1,024 256 576
Vendor Lock-in Open standard Nvidia only Nvidia only
Hardware Availability Late 2026/2027 Production Production

NVLink 5.0 delivers more than 3x the per-connection bandwidth of UALink 1.0 (2,538 GB/s vs 800 GB/s). However, UALink supports nearly 2x the maximum cluster size (1,024 vs 576 GPUs) and operates across multiple vendors.

Design Philosophy Differences

NVLink optimizes for dense, homogeneous GPU clusters where maximum bandwidth between closely-packed accelerators matters most. The technology excels in DGX systems and NVL72 racks where all components come from Nvidia.

UALink targets modular rack-scale architectures where organizations mix accelerators from different vendors or require larger logical clusters. The open standard enables AMD MI-series, Intel Gaudi, and future accelerators to communicate through a common fabric.

AMD's Current Position

AMD's Infinity Fabric connects up to eight MI300X or MI355X GPUs in a fully connected mesh. Each MI300X carries seven Infinity Fabric links with 16 lanes per link, delivering approximately 1.075 TB/s of peer-to-peer bandwidth.

The limitation: scaling beyond 8 GPUs requires Ethernet networking. AMD's roadmap includes AFL (Accelerated Fabric Link) working over PCIe Gen7 links, plus UALink adoption for multi-vendor interoperability.


CXL 4.0: Memory Without Boundaries

The Memory Wall Problem

AI workloads increasingly hit memory bottlenecks before compute limits. Large language models require terabytes of memory for KV caches during inference, while training runs demand even more for activations and optimizer states.

Traditional server architectures attach memory directly to CPUs, creating stranded capacity when workloads vary. CXL decouples memory from compute, enabling dynamic allocation across nodes.

CXL 4.0 Specifications

The CXL Consortium released CXL 4.0 at Supercomputing 2025 on November 18, 2025.

Specification CXL 3.0/3.1 CXL 4.0
Signaling Rate 64 GT/s 128 GT/s
PCIe Generation PCIe 6.0 PCIe 7.0
Bandwidth 256 GB/s (x16) 512 GB/s (x16)
Retimers 2 4
Link Widths x16, x8, x4, x1 x16, x8, x4, x2, x1
Topology Single-rack Multi-rack

Key CXL 4.0 Features

Bundled Ports: CXL 4.0 introduces port aggregation allowing hosts and devices to combine multiple physical ports into a single logical connection. This delivers higher bandwidth while maintaining a simple software model where the system sees one device.

Extended Reach: Four retimers enable multi-rack configurations without sacrificing signal quality. CXL 3.x limited deployments to single-rack topologies; CXL 4.0 extends memory pooling across data center aisles.

Memory Capacity: CXL memory pooling enables 100+ terabytes of memory attached to a single CPU, valuable for organizations mining large datasets or running memory-intensive AI workloads.

Native x2 Links: The new x2 link width option reduces cost for applications requiring moderate bandwidth, improving CXL economics for edge deployments.

CXL Memory Pooling Performance

Demonstrations at CXL DevCon 2025 showed two servers with NVIDIA H100 GPUs running the OPT-6.7B model:

Configuration Performance
CXL Memory Pool Baseline
200G RDMA 3.8x slower
100G RDMA 6.5x slower

CXL provides memory-semantic access with latency in the 200-500 ns range, compared to ~100 µs for NVMe and >10 ms for storage-based memory sharing.

Power and Efficiency Gains

Research shows CXL can cut memory power consumption by 20-30%. Instead of provisioning every node for worst-case memory scenarios, CXL allows dynamic sharing and powers memory only when in use.

Additional benchmarks indicate CXL architecture can increase memory bandwidth by 39% and improve AI training performance by 24%.


UALink and CXL address different layers of the interconnect stack and complement rather than compete with each other.

Protocol Comparison

Aspect UALink CXL
Primary Function GPU-to-GPU communication CPU-to-memory/device
Coherency Model Load-store semantics Cache-coherent
Target Workload AI accelerator scaling Memory expansion/pooling
Typical Topology Switch fabric Point-to-point or switched

Unified Architecture Vision

Panmnesia's architecture demonstrates how CXL and UALink/NVLink can work together in AI superclusters. The design combines GPU node memory sharing (via CXL) with fast inter-GPU networking (via UALink or NVLink).

Emerging proposals like Huawei's UB-Mesh (Hot Chips 2025) aim to unify all interconnects into one massive mesh fabric supporting up to 10 Tbps per chip, though these remain nascent.


Vendor Ecosystem and Hardware Timeline

Vendor Component Expected Availability
AMD MI-series with UALink 2026/2027
Intel Gaudi accelerators 2026/2027
Astera Labs UALink switches 2026/2027
Broadcom UALink switches 2026/2027

The consortium published the final 1.0 specification in April 2025, enabling chip tape-outs. Silicon validation cycles and system integration mean production hardware arrives 12-18 months later.

CXL 4.0 Hardware Roadmap

Milestone Timeline
Spec Publication November 2025
PCIe 7.0 silicon 2026
CXL 4.0 controllers Late 2026
Multi-rack deployments 2027

CXL Adoption Today

CXL 3.x systems already ship from multiple vendors:

Vendor Product CXL Capability
GIGABYTE R284-S91, R283-Z98, R263-Z39 Terabyte-scale memory expansion
XConn Technologies CXL switches Dynamic memory allocation
Compal Electronics Data center platforms AI-optimized CXL

Infrastructure Planning Implications

When to Evaluate Open Interconnects

Organizations should consider UALink and CXL when:

  1. Multi-vendor strategy: Deploying AMD MI-series alongside Intel Gaudi or future accelerators
  2. Scale requirements: Clusters exceeding NVLink's 576-GPU limit
  3. Memory-bound workloads: LLM inference with large KV caches, in-memory databases
  4. Cost optimization: Reducing stranded memory through pooling

NVLink continues to dominate for:

  1. Blackwell deployments: GB200 NVL72 and DGX systems require NVLink
  2. Maximum per-GPU bandwidth: 1.8 TB/s exceeds UALink 1.0's 800 GB/s
  3. Production today: UALink hardware arrives 12+ months from now

Deployment Considerations for Infrastructure Teams

Introl's network of 550 field engineers deploy GPU clusters across 257 global locations. When planning for open interconnect adoption, infrastructure teams should assess:

Factor Consideration
Rack Design UALink requires switch infrastructure; plan for additional rack units
Cabling <4 meter cable lengths for UALink; multi-rack for CXL 4.0
Power CXL memory pooling reduces per-node power; plan for aggregate savings
Cooling Switch infrastructure adds thermal load
Timeline Align refresh cycles with 2026/2027 hardware availability

Key Takeaways

For Infrastructure Planners

  • UALink 1.0 hardware arrives late 2026, enabling 1,024-accelerator clusters across AMD, Intel, and other vendors
  • CXL 4.0 multi-rack deployments target 2027, doubling bandwidth to 128 GT/s
  • Plan rack layouts now to accommodate UALink switches and CXL memory pools

For Operations Teams

  • Current CXL 3.x deployments provide memory pooling benefits today
  • Monitor AMD and Intel accelerator roadmaps for UALink-compatible hardware
  • Evaluate CXL for memory-intensive inference workloads with large KV caches

For Strategic Decision-Makers

  • Open interconnects reduce vendor lock-in but trail NVLink in per-connection bandwidth
  • Hybrid architectures combining NVLink (Nvidia) and UALink (multi-vendor) may emerge
  • The 1,024-GPU scale ceiling positions UALink for next-generation training clusters

References

  1. UALink Consortium. "UALink 200G 1.0 Specification White Paper." April 2025. https://ualinkconsortium.org/wp-content/uploads/2025/04/UALink-1.0-White_Paper_FINAL.pdf

  2. CXL Consortium. "CXL Consortium Releases the Compute Express Link 4.0 Specification." November 18, 2025. https://www.businesswire.com/news/home/20251118275848/en/CXL-Consortium-Releases-the-Compute-Express-Link-4.0-Specification-Increasing-Speed-and-Bandwidth

  3. Network World. "UALink releases inaugural GPU interconnect specification." April 2025. https://www.networkworld.com/article/3957541/ualink-releases-inaugural-gpu-interconnect-specification.html

  4. Next Platform. "UALink Fires First GPU Interconnect Salvo At Nvidia NVSwitch." April 2025. https://www.nextplatform.com/2025/04/08/ualink-fires-first-gpu-interconnect-salvo-at-nvidia-nvswitch/

  5. Blocks and Files. "CXL 4.0 doubles bandwidth and stretches memory pooling to multi-rack setups." November 2025. https://blocksandfiles.com/2025/11/24/cxl-4/

  6. Tom's Hardware. "UALink has Nvidia's NVLink in the crosshairs." April 2025. https://www.tomshardware.com/tech-industry/ualink-has-nvidias-nvlink-in-the-crosshairs-final-specs-support-up-to-1-024-gpus-with-200-gt-s-bandwidth

  7. NVIDIA. "NVLink & NVSwitch: Fastest HPC Data Center Platform." 2025. https://www.nvidia.com/en-us/data-center/nvlink/

  8. NVIDIA. "GB200 NVL72." 2025. https://www.nvidia.com/en-us/data-center/gb200-nvl72/

  9. NVIDIA Developer Blog. "Inside NVIDIA Blackwell Ultra: The Chip Powering the AI Factory Era." 2025. https://developer.nvidia.com/blog/inside-nvidia-blackwell-ultra-the-chip-powering-the-ai-factory-era/

  10. Blocks and Files. "The Ultra Accelerator Link Consortium has released its first spec." April 2025. https://blocksandfiles.com/2025/04/09/the-ultra-accelerator-link-consortium-has-released-its-first-spec/

  11. Data Center Dynamics. "UALink Consortium releases 200G 1.0 specification." April 2025. https://www.datacenterdynamics.com/en/news/ualink-consortium-releases-200g-10-specification-for-ai-accelerator-interconnects/

  12. Storage Review. "UALink Consortium Finalizes 1.0 Specification." April 2025. https://www.storagereview.com/news/ualink-consortium-finalizes-1-0-specification-for-ai-accelerator-interconnects

  13. Tom's Hardware. "Ultra Accelerator Link is an open-standard interconnect." 2024. https://www.tomshardware.com/tech-industry/artificial-intelligence/amd-broadcom-intel-google-microsoft-and-others-team-up-for-ultra-accelerator-link-an-open-standard-interconnect-for-ai-accelerators

  14. SDxCentral. "UALink Consortium releases 200G 1.0 specification." April 2025. https://www.sdxcentral.com/news/ualink-consortium-releases-200g-10-specification-for-ai-accelerator-interconnects/

  15. Phoronix. "UALink 200G 1.0 Specification Published." April 2025. https://www.phoronix.com/news/UALink-200G-1.0-Released

  16. VideoCardz. "CXL 4.0 spec moves to PCIe 7.0." November 2025. https://videocardz.com/newz/cxl-4-0-spec-moves-to-pcie-7-0-doubles-bandwidth-over-cxl-3-0

  17. Storage Newsletter. "SC25: CXL Consortium Unveils Compute Express Link 4.0 Specs." November 2025. https://www.storagenewsletter.com/2025/11/19/sc25-cxl-consortium-unveils-compute-express-link-4-0-specs-increasing-speed-and-bandwidth/

  18. IndexBox. "CXL 4.0 Doubles Bandwidth, Adds Port Bundling for AI Workloads." 2025. https://www.indexbox.io/blog/cxl-40-specification-released-with-port-bundling-for-ai-and-hpc/

  19. Synopsys. "CXL 4.0, Bandwidth First: What Designers Are Solving for Next." 2025. https://www.synopsys.com/blogs/chip-design/cxl-4-bandwidth-first-what-designers-are-solving-next.html

  20. LoveChip. "UALink vs NVLink: What Is the Difference?" 2025. https://www.lovechip.com/blog/ualink-vs-nvlink-what-is-the-difference-

  21. Network World. "Arm backs both sides in UALink vs NVLink battle." November 2025. https://www.networkworld.com/article/4091468/arm-jumps-on-the-nvidia-nvlink-fusion-bandwagon-at-sc25.html

  22. Learn Grow Thrive. "Nvidia's NVLink Vs. UALink." 2025. https://www.learngrowthrive.net/p/nvidias-nvlink-vs-ualink

  23. BITSILICA. "UALink and the Battle for Rack-Scale GPU Interconnect." 2025. https://bitsilica.com/ualink-and-the-battle-for-rack-scale-gpu-interconnect/

  24. DigitalDefynd. "What is NVLink and NVSwitch?" 2025. https://digitaldefynd.com/IQ/nvlink-and-nvswitch-pros-cons/

  25. Massed Compute. "How does NVLink compare to AMD's Infinity Fabric?" 2025. https://massedcompute.com/faq-answers/?question=How+does+NVLink+compare+to+AMD's+Infinity+Fabric+in+terms+of+performance?

  26. Emergent Mind. "AMD Instinct MI300X GPU Architecture." 2025. https://www.emergentmind.com/topics/amd-instinct-mi300x-gpus

  27. Emergent Mind. "Infinity Fabric Interconnect Overview." 2025. https://www.emergentmind.com/topics/infinity-fabric-interconnect

  28. ServeTheHome. "AMD Infinity Fabric AFL Scale Up Competitor to NVIDIA NVLink." 2025. https://www.servethehome.com/amd-infinity-fabric-afl-scale-up-competitor-to-nvidia-nvlink-coming-to-broadcom-switches-in-pcie-gen7/

  29. GIGABYTE. "Revolutionizing the AI Factory: The Rise of CXL Memory Pooling." 2025. https://www.gigabyte.com/Article/revolutionizing-the-ai-factory-the-rise-of-cxl-memory-pooling

  30. CXL Consortium. "Overcoming the AI Memory Wall: How CXL Memory Pooling Powers the Next Leap." 2025. https://computeexpresslink.org/blog/overcoming-the-ai-memory-wall-how-cxl-memory-pooling-powers-the-next-leap-in-scalable-ai-computing-4267/

  31. CXL Consortium. "Expanding your memory footprint with CXL at FMS 2025." 2025. https://computeexpresslink.org/blog/expanding-your-memory-footprint-with-cxl-at-fms-2025-4133/

  32. CXL Consortium. "Breaking Boundaries in Memory: Highlights from AI Infra Summit and SDC 2025." 2025. https://computeexpresslink.org/blog/breaking-boundaries-in-memory-highlights-from-ai-infra-summit-and-sdc-2025-4198/

  33. Storage Newsletter. "CXL DevCon 2025: XConn Technologies Demonstrates Dynamic Memory Allocation." April 2025. https://www.storagenewsletter.com/2025/04/30/cxl-devcon-2025-xconn-technologies-demonstrates-dynamic-memory-allocation-using-cxl-switch-and-amd-technologies/

  34. Morningstar. "Compal Redefines AI-Driven Data Centers with CXL and Liquid Cooling." October 2025. https://www.morningstar.com/news/pr-newswire/20251013hk96077/compal-redefines-ai-driven-data-centers-with-cxl-and-liquid-cooling-innovations-at-the-2025-ocp-global-summit

  35. Penguin Solutions. "Why AI Needs Compute Express Link (CXL)." 2025. https://www.penguinsolutions.com/en-us/resources/blog/why-ai-needs-cxl

  36. Blocks and Files. "Panmnesia pushes unified memory and interconnect design for AI superclusters." July 2025. https://blocksandfiles.com/2025/07/18/panmnesia-cxl-over-xlink-ai-supercluster-architecture/

  37. Fluence. "Best GPU for AI: Practical Buying Guide for AI Teams (2025)." 2025. https://www.fluence.network/blog/best-gpu-for-ai-2025/

  38. Clarifai. "MI300X vs B200: AMD vs NVIDIA Next-Gen GPU Performance." 2025. https://www.clarifai.com/blog/mi300x-vs-b200

  39. NexGen Cloud. "NVIDIA Blackwell GPUs: All You Need to Know." 2025. https://www.nexgencloud.com/blog/performance-benchmarks/nvidia-blackwell-gpus-architecture-features-specs

  40. Hardware Nation. "NVIDIA NVLink 5.0: Accelerating Multi-GPU Communication." 2025. https://hardwarenation.com/resources/blog/nvidia-nvlink-5-0-accelerating-multi-gpu-communication/


Published: December 30, 2025

Request a Quote_

Tell us about your project and we'll respond within 72 hours.

> TRANSMISSION_COMPLETE

Request Received_

Thank you for your inquiry. Our team will review your request and respond within 72 hours.

QUEUED FOR PROCESSING