December 13, 2025
December 2025 Update: The CXL Consortium released CXL 4.0 on November 18, 2025, doubling bandwidth to 128 GT/s via PCIe 7.0 and introducing bundled ports for 1.5 TB/s connections. This guide covers deployment planning for organizations preparing to implement CXL-based memory pooling in their AI infrastructure.
TL;DR
CXL 4.0 enables memory pooling at unprecedented scale, allowing AI inference workloads to access 100+ terabytes of shared memory with cache coherency across multiple racks. The specification's bundled ports aggregate multiple physical connections into single logical attachments delivering 1.5 TB/s bandwidth. For infrastructure planners, the key decisions involve understanding when to adopt CXL (2026-2027 for production), which products to evaluate now (CXL 2.0/3.0 switches shipping), and how CXL complements rather than replaces NVLink and UALink. This guide provides the technical depth and decision frameworks needed to plan CXL deployments.
The Memory Wall Problem
Large language models hit a fundamental constraint: GPU memory capacity. Modern AI inference workloads routinely exceed 80-120 GB per GPU, and the key-value (KV) cache grows with context length.1 A single inference request with a 128K context window can consume tens of gigabytes just for KV cache storage.
The problem intensifies at scale. Model weights for frontier LLMs consume hundreds of gigabytes. KV cache requirements grow linearly with both batch size and sequence length. GPU VRAM remains fixed at 80GB (H100) or 192GB (B200).2
Traditional solutions fall short:
| Approach | Limitation |
|---|---|
| Add more GPUs | Linear cost increase, memory still isolated per GPU |
| NVMe offloading | ~100 μs latency, 100x slower than DRAM |
| RDMA-based sharing | Still 10-20 μs latency, complex networking |
| Larger GPU memory | Supply-constrained, expensive |
CXL changes this equation by enabling memory pooling with DRAM-like latency (200-500 ns) across the data center.3
CXL 4.0 Technical Deep Dive
Evolution from CXL 1.0 to 4.0
CXL has matured rapidly since its 2019 introduction. Each generation expanded capabilities:
| Generation | Release | PCIe Base | Speed | Key Advancement |
|---|---|---|---|---|
| CXL 1.0/1.1 | 2019/2020 | PCIe 5.0 | 32 GT/s | Basic coherent memory attach |
| CXL 2.0 | 2022 | PCIe 5.0 | 32 GT/s | Switching, memory pooling, multi-device |
| CXL 3.0/3.1 | 2023/2024 | PCIe 6.0 | 64 GT/s | Fabric support, peer-to-peer, 4,096 nodes |
| CXL 4.0 | Nov 2025 | PCIe 7.0 | 128 GT/s | Bundled ports, multi-rack, enhanced RAS |
CXL 2.0 introduced the foundational concept of memory pooling. Multiple Type 3 memory devices connect to a switch, forming a shared pool from which the switch dynamically allocates resources to different hosts.4 This enables memory utilization improvements from typical 50-60% to 85%+ across a cluster.
CXL 3.0 added fabric capabilities supporting multi-level switching and up to 4,096 nodes with port-based routing (PBR).5 The shift to 256-byte FLITs and PCIe 6.0's 64 GT/s doubled available bandwidth.
CXL 4.0 doubles bandwidth again while introducing features critical for multi-rack AI deployments.
Bundled Ports Architecture
CXL 4.0's most significant feature for high-performance computing: bundled ports aggregate multiple physical CXL device ports into a single logical entity.6
How bundled ports work:
- A host and Type 1/2 device combine multiple physical ports
- System software sees a single device despite multiple physical connections
- Bandwidth aggregates across all bundled ports
- Optimized for 256-byte FLIT mode, eliminating legacy overhead
Bandwidth calculations:
| Configuration | Direction | Bandwidth |
|---|---|---|
| Single x16 port @ 128 GT/s | Unidirectional | 256 GB/s |
| Single x16 port @ 128 GT/s | Bidirectional | 512 GB/s |
| 3 bundled x16 ports @ 128 GT/s | Unidirectional | 768 GB/s |
| 3 bundled x16 ports @ 128 GT/s | Bidirectional | 1,536 GB/s |
For context, HBM3e memory on an H200 delivers 4.8 TB/s bandwidth.7 A bundled CXL 4.0 connection at 1.5 TB/s represents approximately 30% of that bandwidth—sufficient for many memory expansion use cases where capacity matters more than peak bandwidth.
PCIe 7.0 Foundation
CXL 4.0 builds on PCIe 7.0's physical layer improvements:8
- 128 GT/s transfer rate: Double the 64 GT/s of PCIe 6.0
- PAM4 signaling: Same encoding scheme as PCIe 6.0
- Improved FEC: Forward error correction for signal integrity
- Optical support: Enables longer reach connections
The specification retains the 256-byte FLIT format from CXL 3.x while adding a latency-optimized variant for time-sensitive operations.9
Multi-Rack Fabric Capabilities
CXL 4.0 extends reach through two mechanisms:
Four retimers supported: Previous generations allowed two retimers. Four retimers enable longer physical connections spanning multiple racks without signal degradation.10
Native x2 width: Previously a degraded fallback mode, x2 links now operate at full performance. This enables higher fan-out configurations where many lower-bandwidth connections serve more endpoints.11
These features combine to enable "multi-rack memory pooling"—a capability the CXL Consortium explicitly targets for late 2026-2027 production deployment.12
CXL Use Cases for AI Infrastructure
KV Cache Offloading for LLM Inference
The highest-impact near-term use case: offloading KV cache from GPU VRAM to CXL-attached memory.
The problem: LLM inference with long contexts generates massive KV caches. A 70B parameter model with 128K context and batch size 32 can require 150+ GB just for KV cache.13 This exceeds H100 VRAM, forcing expensive batch size reductions or multiple GPUs.
The CXL solution: Store KV cache in pooled CXL memory while keeping hot layers in GPU VRAM. XConn and MemVerge demonstrated this at SC25 and OCP 2025:14
- Two H100 GPUs (80GB each) running OPT-6.7B
- KV cache offloaded to shared CXL memory pool
- 3.8x speedup vs 200G RDMA
- 6.5x speedup vs 100G RDMA
- >5x improvement vs SSD-based KV cache
Research from academia confirms the opportunity. PNM-KV (Processing-Near-Memory for KV cache) achieves up to 21.9x throughput improvement by offloading token page selection to accelerators within CXL memory.15
Memory Expansion for Training
Training workloads benefit from expanded memory capacity for:
- Larger batch sizes: More samples per iteration without gradient accumulation
- Activation checkpointing reduction: Store more activations in memory vs recomputation
- Optimizer state: Adam optimizer requires 2x parameters for momentum/variance
CXL memory expansion enables training configurations previously requiring multi-node distribution to run on single nodes, reducing communication overhead.
Scientific and HPC Workloads
PNNL's Crete project uses CXL pools for high-throughput memory sharing across compute nodes in scientific simulations.16 Use cases include:
- Molecular dynamics with large neighbor lists
- Graph analytics on trillion-edge datasets
- In-memory databases exceeding single-server capacity
The Interconnect Landscape
CXL vs NVLink vs UALink
Understanding where CXL fits requires recognizing that these technologies serve different purposes:
| Standard | Primary Purpose | Best For |
|---|---|---|
| CXL | Memory coherency + pooling | CPU-memory expansion, shared memory pools |
| NVLink | GPU-to-GPU scaling | Within-node GPU communication |
| UALink | Accelerator interconnect | Open standard alternative to NVLink |
| Ultra Ethernet | Scale-out networking | Multi-rack, 10,000+ endpoints |
CXL runs on PCIe SerDes: lower error rate, lower latency, but lower bandwidth than NVLink/UALink's Ethernet-style SerDes.17 NVLink 5 delivers 1.8 TB/s per GPU—far exceeding CXL 4.0's 512 GB/s per x16 port.18
The technologies complement rather than compete:
- Within a GPU node: NVLink connects GPUs
- Between nodes: UALink or InfiniBand/Ethernet
- Memory expansion: CXL adds capacity to CPUs and accelerators
- Fabric-wide memory pools: CXL switches enable sharing across hosts
Panmnesia proposes "CXL-over-XLink" architectures integrating all three, reporting 5.3x faster AI training and 6x inference latency reduction vs PCIe/RDMA baselines.19
Decision Framework: When to Use What
| Scenario | Recommended Interconnect | Rationale |
|---|---|---|
| Multi-GPU training within server | NVLink | Highest bandwidth, lowest latency |
| Multi-GPU inference pod (non-NVIDIA) | UALink | Open standard, high bandwidth |
| Expand memory beyond VRAM | CXL | Cache coherency, DRAM-like latency |
| Multi-rack GPU cluster | InfiniBand or Ultra Ethernet | Designed for scale-out |
| Shared memory pool across servers | CXL switches | Memory pooling with coherency |
| China/restricted markets | Consider UB-Mesh | Avoids Western IP dependencies |
CXL Ecosystem: Vendors and Products
Memory Expanders
The three major DRAM manufacturers all ship CXL memory expanders:
| Vendor | Product | Capacity | Interface | Status |
|---|---|---|---|---|
| Samsung | CMM-D | 256 GB | CXL 2.0 | Mass production 202520 |
| SK Hynix | CMM-DDR5 | 128 GB | CXL 2.0 | Mass production late 202421 |
| Micron | CZ120 | 256 GB | CXL 2.0 | Sampling22 |
| SK Hynix | CMS | 512 GB | CXL (compute-enabled) | Announced23 |
SK Hynix's CMS (Computational Memory Solution) adds compute capabilities directly in the memory module—an early implementation of processing-near-memory for CXL.
Switch Vendors
CXL switches enable memory pooling across multiple hosts:
| Vendor | Product | Generation | Status | Key Feature |
|---|---|---|---|---|
| XConn | XC50256 | CXL 2.0 | Shipping | 256-lane switch, first to market24 |
| XConn | Apollo | CXL 2.0 | Shipping | Memory pooling demonstrations at SC2525 |
| Panmnesia | Fabric Switch | CXL 3.2 | Sampling Nov 2025 | First PBR implementation26 |
| Astera Labs | Leo | CXL 2.0 | Shipping | Smart memory controller27 |
| Microchip | SMC 2000 | CXL 2.0 | Shipping | Memory expansion controller28 |
Panmnesia's CXL 3.2 Fabric Switch represents a generation leap: first silicon implementing port-based routing for true fabric architectures with up to 4,096 nodes.29
Controller Vendors
CXL memory controllers translate between CXL protocol and DRAM:
| Vendor | Role | Key Products |
|---|---|---|
| Marvell | Controller | Structera CXL controllers30 |
| Montage | Controller | CXL memory buffer chips |
| Astera Labs | Controller | Leo smart memory controller |
| Microchip | Controller | SMC 2000 series |
Marvell's Structera completed interoperability testing with all three major memory suppliers (Samsung, Micron, SK Hynix) on both Intel and AMD platforms.31
Deployment Planning Guide
Timeline
| Period | CXL Generation | Expected Capability | Recommendation |
|---|---|---|---|
| Now-Q2 2026 | CXL 2.0 | Memory expansion, basic pooling | Production evaluation |
| Q3 2026-Q4 2026 | CXL 3.0/3.1 | Fabric, peer-to-peer, 4K nodes | Early adoption for AI |
| 2027+ | CXL 4.0 | Multi-rack pooling, 1.5 TB/s | Planning begins now |
ABI Research expects CXL 3.0/3.1 solutions with sufficient software support for commercial adoption by 2027.32
What to Evaluate Now
Immediate (2025): 1. Test CXL 2.0 memory expanders on existing Intel Sapphire Rapids or AMD EPYC Genoa servers 2. Evaluate XConn or Astera Labs switches for memory pooling proofs-of-concept 3. Benchmark KV cache offloading with MemVerge GISMO technology33
2026 planning: 1. Assess Panmnesia CXL 3.2 switch samples when available 2. Plan for CXL-enabled GPU servers (NVIDIA Blackwell supports CXL)34 3. Develop software stack for memory tiering between GPU VRAM and CXL memory
Infrastructure Readiness Checklist
Before deploying CXL:
- [ ] CPU platform supports CXL (Intel 4th Gen+ or AMD EPYC 4th Gen+)
- [ ] PCIe slots available for CXL devices
- [ ] Operating system with CXL support (Linux 6.0+ for basic, 6.8+ for hotplug)35
- [ ] Application software can tier memory (NUMA-aware or explicit APIs)
- [ ] Monitoring for CXL RAS events and memory errors
Software Stack Considerations
CXL memory appears as additional NUMA nodes to the operating system. Applications must be CXL-aware or NUMA-aware to benefit:
| Software Layer | Requirement | Solutions |
|---|---|---|
| OS kernel | CXL driver support | Linux 6.0+, Windows Server 2025 |
| Memory allocator | NUMA-aware allocation | numactl, hwloc, libmemkind |
| Application | Memory tiering | Explicit placement or transparent tiering |
| AI framework | KV cache offloading | vLLM CXL support (in development)36 |
MemVerge's Memory Machine provides transparent memory tiering for applications without CXL-specific code paths.37
Cost Considerations
CXL memory costs less per GB than GPU VRAM:
| Memory Type | Approximate Cost/GB | Latency |
|---|---|---|
| HBM3e (in GPU) | $15-25 | ~10 ns |
| DDR5 DIMM | $3-5 | ~80 ns |
| CXL-attached DDR5 | $4-7 | 200-500 ns |
| NVMe SSD | $0.10-0.20 | ~100 μs |
For KV cache that tolerates 200-500 ns latency, CXL offers 4-5x cost reduction vs keeping data in GPU VRAM while delivering 200-500x lower latency than SSD offloading.38
Migration Strategy
Phase 1: Memory expansion (Now) - Deploy CXL memory expanders for capacity - No application changes required - OS treats CXL memory as slow NUMA node
Phase 2: Memory tiering (2025-2026) - Implement tiering between fast local memory and CXL - Hot data in local DRAM, cold data in CXL memory - Requires memory management software
Phase 3: Memory pooling (2026-2027) - Deploy CXL switches for shared memory pools - Multiple hosts access common memory resources - Enables disaggregated memory architecture
Integration with Existing Infrastructure
NVIDIA GPU Systems
NVIDIA supports CXL on Blackwell architecture. The Grace CPU in Grace Hopper systems includes CXL support, enabling memory expansion beyond the 480GB unified memory.39
For H100 systems, CXL memory expanders attach to the host CPU, not directly to GPUs. KV cache offloading requires CPU involvement to copy data between GPU VRAM and CXL memory.
AMD GPU Systems
AMD MI300X includes CXL support through its CPU chiplet. The 192GB unified memory can be supplemented with CXL-attached capacity.40
Intel GPU Systems
Intel Data Center GPU Max (Ponte Vecchio) supports CXL for memory expansion on systems with Sapphire Rapids or later CPUs.41
Risks and Considerations
Standards Fragmentation
Four competing interconnect ecosystems (CXL/PCIe, UALink, Ultra Ethernet, NVLink) force infrastructure planners to make bets. Equipment purchased today may face interoperability challenges in 2027.
Mitigation: CXL's PCIe foundation ensures backward compatibility. CXL 4.0 devices will work with CXL 3.x, 2.0, 1.1, and 1.0 systems at reduced capability.42
Software Maturity
CXL software support remains early-stage. Linux kernel support exists but many applications lack CXL-specific optimizations.
Mitigation: Use memory tiering software like MemVerge Memory Machine that provides transparent CXL integration.
Supply Chain
CXL 4.0 products won't reach volume production until 2027. CXL 3.0/3.1 availability depends on PCIe 6.0 ecosystem maturity.
Mitigation: Begin with CXL 2.0 products available today. Build operational experience before next-generation availability.
Key Takeaways
For infrastructure planners: - CXL 4.0 enables 100+ TB memory pools with cache coherency across racks - Bundled ports deliver 1.5 TB/s bandwidth per logical connection - Production deployment timeline: CXL 2.0 now, CXL 3.x late 2026, CXL 4.0 2027+ - Start evaluating CXL 2.0 memory expanders and switches immediately
For AI platform teams: - KV cache offloading to CXL memory delivers 3.8-6.5x speedup vs RDMA - CXL memory costs 4-5x less than GPU VRAM while delivering 200-500x lower latency than SSD - vLLM and other frameworks developing CXL-aware KV cache management - Test with XConn/MemVerge demonstrations available now
For strategic planning: - CXL complements NVLink/UALink; they serve different purposes - No single interconnect standard will "win"—plan for coexistence - Equipment decisions in 2025-2026 affect interoperability through 2030 - Chinese alternative (Huawei UB-Mesh) may create parallel ecosystem
For procurement: - CXL 2.0 products shipping from Samsung, SK Hynix, Micron - XConn and Astera Labs switches available for evaluation - Panmnesia CXL 3.2 fabric switch sampling November 2025 - Budget for CXL-enabled servers in 2026-2027 refresh cycles
For AI infrastructure deployment with CXL-enabled memory architecture, contact Introl.
References
-
Compute Express Link. "Overcoming the AI Memory Wall: How CXL Memory Pooling Powers the Next Leap in Scalable AI Computing." 2025. https://computeexpresslink.org/blog/overcoming-the-ai-memory-wall-how-cxl-memory-pooling-powers-the-next-leap-in-scalable-ai-computing-4267/ ↩
-
NVIDIA. "H200 and B200 Specifications." 2025. ↩
-
Synopsys. "How CXL and Memory Pooling Reduce HPC Latency." 2025. https://www.synopsys.com/blogs/chip-design/cxl-protocol-memory-pooling.html ↩
-
AMI Next Blog. "CXL Deep Dive: From Principles to Memory Pooling." 2025. https://www.aminext.blog/en/post/cxl-compute-express-link-deep-dive-1 ↩
-
CXL Consortium. "CXL 3.0 Specification." 2023. ↩
-
Blocks and Files. "CXL 4.0 doubles bandwidth and stretches memory pooling to multi-rack setups." November 24, 2025. https://blocksandfiles.com/2025/11/24/cxl-4/ ↩
-
NVIDIA. "H200 Tensor Core GPU Specifications." 2024. ↩
-
VideoCardz. "CXL 4.0 spec moves to PCIe 7.0, doubles bandwidth over CXL 3.0." November 2025. https://videocardz.com/newz/cxl-4-0-spec-moves-to-pcie-7-0-doubles-bandwidth-over-cxl-3-0 ↩
-
Synopsys. "CXL 4.0, Bandwidth First: What Designers Are Solving for Next." December 2025. https://www.synopsys.com/blogs/chip-design/cxl-4-bandwidth-first-what-designers-are-solving-next.html ↩
-
Datacenter News. "CXL 4.0 doubles bandwidth, introduces bundled ports for data centres." November 2025. https://datacenter.news/story/cxl-4-0-doubles-bandwidth-introduces-bundled-ports-for-data-centres ↩
-
CXL Consortium. "CXL 4.0 Webinar." December 4, 2025. https://computeexpresslink.org/wp-content/uploads/2025/12/CXL_4.0-Webinar_December-2025_FINAL.pdf ↩
-
Business Wire. "CXL Consortium Releases the Compute Express Link 4.0 Specification." November 18, 2025. https://www.businesswire.com/news/home/20251118275848/en/CXL-Consortium-Releases-the-Compute-Express-Link-4.0-Specification-Increasing-Speed-and-Bandwidth ↩
-
arXiv. "Scalable Processing-Near-Memory for 1M-Token LLM Inference." November 2025. https://arxiv.org/abs/2511.00321 ↩
-
PRWeb. "XConn Technologies and MemVerge Demonstrate CXL Memory Pool for KV Cache." October 2025. https://www.prweb.com/releases/xconn-technologies-and-memverge-demonstrate-cxl-memory-pool-for-kv-cache-using-nvidia-dynamo-for-breakthrough-ai-workload-performance-at-2025-ocp-global-summit-302581860.html ↩
-
arXiv. "PNM-KV: Scalable Processing-Near-Memory for 1M-Token LLM Inference." 2025. https://arxiv.org/html/2501.09020v1 ↩
-
CXL Consortium. "How CXL Transforms Server Memory Infrastructure." October 2025. https://computeexpresslink.org/wp-content/uploads/2025/10/CXL_Q3-2025-Webinar_FINAL.pdf ↩
-
Blocks and Files. "Panmnesia pushes unified memory and interconnect design for AI superclusters." July 2025. https://blocksandfiles.com/2025/07/18/panmnesia-cxl-over-xlink-ai-supercluster-architecture/ ↩
-
Next Platform. "UALink Fires First GPU Interconnect Salvo At Nvidia NVSwitch." April 2025. https://www.nextplatform.com/2025/04/08/ualink-fires-first-gpu-interconnect-salvo-at-nvidia-nvswitch/ ↩
-
Business News This Week. "Panmnesia Introduces AI Infrastructure with CXL over NVLink and UALink." July 2025. https://businessnewsthisweek.com/technology/panmnesia-introduces-todays-and-tomorrows-ai-infrastructure-including-a-supercluster-architecture-that-integrates-nvlink-ualink-and-hbm-via-cxl/ ↩
-
TrendForce. "SK hynix and Samsung Step up Focus on HBM4 and CXL." October 2024. https://www.trendforce.com/news/2024/10/24/news-sk-hynix-and-samsung-reportedly-step-up-focus-on-hbm4-and-cxl-amid-rising-chinese-competition/ ↩
-
ServeTheHome. "SK hynix CXL 2.0 Memory Expansion Modules Launched." 2024. https://www.servethehome.com/sk-hynix-cxl-2-0-memory-expansion-modules-launched-with-96gb-of-ddr5/ ↩
-
AnandTech. "CXL Gathers Momentum at FMS 2024." 2024. https://www.anandtech.com/show/21533/cxl-gathers-momentum-at-fms-2024 ↩
-
Tom's Hardware. "SK Hynix Unveils CXL Memory Module with Compute Capabilities." 2025. https://www.tomshardware.com/news/sk-hynix-unveils-cxl-computational-memory-solution ↩
-
EE Times. "XConn Shows Off First CXL Switch." 2024. https://www.eetimes.com/xconn-shows-off-first-cxl-switch/ ↩
-
CXL Consortium. "Supercomputing 2025 Demonstrations." November 2025. https://computeexpresslink.org/event/supercomputing-2025/ ↩
-
Business Wire. "Panmnesia Announces Sample Availability of PCIe 6.0/CXL 3.2 Fabric Switch." November 12, 2025. https://www.businesswire.com/news/home/20251112667725/en/Panmnesia-Announces-Sample-Availability-of-PCIe-6.0CXL-3.2-Fabric-Switch ↩
-
Storage Newsletter. "FMS 2025: Astera Labs and SMART Modular CXL Demo." August 2025. https://www.storagenewsletter.com/2025/08/06/fms-2025-h3-platform-debuts-cxl-memory-sharing-and-pooling-solution/ ↩
-
AnandTech. "Micron and Microchip CXL 2.0 Memory Module." 2024. ↩
-
TechPowerUp. "Panmnesia Samples Industry's First PCIe 6.0/CXL 3.2 Fabric Switch." November 2025. ↩
-
Yahoo Finance. "Marvell Extends CXL Ecosystem Leadership with Structera." September 2025. https://finance.yahoo.com/news/marvell-extends-cxl-ecosystem-leadership-130000255.html ↩
-
Marvell. "Structera CXL Memory-Expansion Controllers Interoperability Announcement." September 2025. ↩
-
GIGABYTE. "Revolutionizing the AI Factory: The Rise of CXL Memory Pooling." 2025. https://www.gigabyte.com/Article/revolutionizing-the-ai-factory-the-rise-of-cxl-memory-pooling ↩
-
Storage Newsletter. "XConn Technologies and MemVerge CXL Memory Solution for KV Cache." November 2025. https://www.storagenewsletter.com/2025/11/21/sc25-xconn-technologies-and-memverge-to-deliver-breakthrough-scalable-cxl-memory-solution-to-offload-kv-cache-and-prefill-decode-disaggregation-in-ai-inference-workloads/ ↩
-
NVIDIA. "Blackwell Architecture Technical Brief." 2024. ↩
-
Wikipedia. "Compute Express Link - Linux Support." https://en.wikipedia.org/wiki/Compute_Express_Link ↩
-
arXiv. "Exploring CXL-based KV Cache Storage for LLM Serving." 2024. https://mlforsystems.org/assets/papers/neurips2024/paper17.pdf ↩
-
MemVerge. "Memory Machine Software for CXL." 2025. ↩
-
Penguin Solutions. "Explaining CXL Memory 101: Expansion, Pooling, & Sharing." 2025. https://www.penguinsolutions.com/en-us/resources/blog/what-is-cxl-memory-expansion ↩
-
NVIDIA. "Grace Hopper Superchip Architecture." 2024. ↩
-
AMD. "Instinct MI300X Specifications." 2024. ↩
-
Intel. "Data Center GPU Max (Ponte Vecchio) Documentation." 2024. ↩
-
CXL Consortium. "CXL 4.0 Backward Compatibility Statement." November 2025. ↩
-
Servers Simply. "Next-Gen HPC & AI Infrastructure 2025: GPUs, CXL, Gen5 NVMe." 2025. https://www.serversimply.com/blog/next-gen-hpc-and-ai-infrastructure-in-2025 ↩
-
Semi Engineering. "CXL Thriving As Memory Link." 2025. https://semiengineering.com/cxl-thriving-as-memory-link/ ↩
-
HPCwire. "Everyone Except Nvidia Forms Ultra Accelerator Link Consortium." May 2024. https://www.hpcwire.com/2024/05/30/everyone-except-nvidia-forms-ultra-accelerator-link-ualink-consortium/ ↩
-
ServeTheHome. "Huawei Presents UB-Mesh Interconnect for Large AI SuperNodes at Hot Chips 2025." August 2025. ↩
-
arXiv. "Amplifying Effective CXL Memory Bandwidth for LLM Inference via Transparent Near-Data Processing." September 2025. https://arxiv.org/abs/2509.03377 ↩
-
CXL Consortium. "Advantages of CXL Memory Sharing for Emerging Applications." June 2025. https://computeexpresslink.org/wp-content/uploads/2025/06/CXL_Q2-2025-Webinar_FINAL.pdf ↩