CXL 4.0 Infrastructure Planning Guide: Memory Pooling for AI at Scale

Complete CXL 4.0 deployment guide covering bundled ports, multi-rack memory pooling, KV cache offloading, vendor ecosystem, and 2026-2027 planning timeline.

Blake Crosley

Mar 29, 2026 14 min read Disclaimer

CXL 4.0 Infrastructure Planning Guide: Memory Pooling for AI at Scale

December 13, 2025

December 2025 Update: The CXL Consortium released CXL 4.0 on November 18, 2025, doubling bandwidth to 128 GT/s via PCIe 7.0 and introducing bundled ports for 1.5 TB/s connections. This guide covers deployment planning for organizations preparing to implement CXL-based memory pooling in their AI infrastructure.

TL;DR

CXL 4.0 enables memory pooling at unprecedented scale, allowing AI inference workloads to access 100+ terabytes of shared memory with cache coherency across multiple racks. The specification's bundled ports aggregate multiple physical connections into single logical attachments delivering 1.5 TB/s bandwidth. For infrastructure planners, the key decisions involve understanding when to adopt CXL (2026-2027 for production), which products to evaluate now (CXL 2.0/3.0 switches shipping), and how CXL complements rather than replaces NVLink and UALink. This guide provides the technical depth and decision frameworks needed to plan CXL deployments.

The Memory Wall Problem

Large language models hit a fundamental constraint: GPU memory capacity. Modern AI inference workloads routinely exceed 80-120 GB per GPU, and the key-value (KV) cache grows with context length.¹ A single inference request with a 128K context window can consume tens of gigabytes just for KV cache storage.

The problem intensifies at scale. Model weights for frontier LLMs consume hundreds of gigabytes. KV cache requirements grow linearly with both batch size and sequence length. GPU VRAM remains fixed at 80GB (H100) or 192GB (B200).²

Traditional solutions fall short:

Approach	Limitation
Add more GPUs	Linear cost increase, memory still isolated per GPU
NVMe offloading	~100 μs latency, 100x slower than DRAM
RDMA-based sharing	Still 10-20 μs latency, complex networking
Larger GPU memory	Supply-constrained, expensive

CXL changes this equation by enabling memory pooling with DRAM-like latency (200-500 ns) across the data center.³

CXL 4.0 Technical Deep Dive

Evolution from CXL 1.0 to 4.0

CXL has matured rapidly since its 2019 introduction. Each generation expanded capabilities:

Generation	Release	PCIe Base	Speed	Key Advancement
CXL 1.0/1.1	2019/2020	PCIe 5.0	32 GT/s	Basic coherent memory attach
CXL 2.0	2022	PCIe 5.0	32 GT/s	Switching, memory pooling, multi-device
CXL 3.0/3.1	2023/2024	PCIe 6.0	64 GT/s	Fabric support, peer-to-peer, 4,096 nodes
CXL 4.0	Nov 2025	PCIe 7.0	128 GT/s	Bundled ports, multi-rack, enhanced RAS

CXL 2.0 introduced the foundational concept of memory pooling. Multiple Type 3 memory devices connect to a switch, forming a shared pool from which the switch dynamically allocates resources to different hosts.⁴ This enables memory utilization improvements from typical 50-60% to 85%+ across a cluster.

CXL 3.0 added fabric capabilities supporting multi-level switching and up to 4,096 nodes with port-based routing (PBR).⁵ The shift to 256-byte FLITs and PCIe 6.0's 64 GT/s doubled available bandwidth.

CXL 4.0 doubles bandwidth again while introducing features critical for multi-rack AI deployments.

Bundled Ports Architecture

CXL 4.0's most significant feature for high-performance computing: bundled ports aggregate multiple physical CXL device ports into a single logical entity.⁶

How bundled ports work:

A host and Type 1/2 device combine multiple physical ports
System software sees a single device despite multiple physical connections
Bandwidth aggregates across all bundled ports
Optimized for 256-byte FLIT mode, eliminating legacy overhead

Bandwidth calculations:

Configuration	Direction	Bandwidth
Single x16 port @ 128 GT/s	Unidirectional	256 GB/s
Single x16 port @ 128 GT/s	Bidirectional	512 GB/s
3 bundled x16 ports @ 128 GT/s	Unidirectional	768 GB/s
3 bundled x16 ports @ 128 GT/s	Bidirectional	1,536 GB/s

For context, HBM3e memory on an H200 delivers 4.8 TB/s bandwidth.⁷ A bundled CXL 4.0 connection at 1.5 TB/s represents approximately 30% of that bandwidth—sufficient for many memory expansion use cases where capacity matters more than peak bandwidth.

PCIe 7.0 Foundation

CXL 4.0 builds on PCIe 7.0's physical layer improvements:⁸

128 GT/s transfer rate: Double the 64 GT/s of PCIe 6.0
PAM4 signaling: Same encoding scheme as PCIe 6.0
Improved FEC: Forward error correction for signal integrity
Optical support: Enables longer reach connections

The specification retains the 256-byte FLIT format from CXL 3.x while adding a latency-optimized variant for time-sensitive operations.⁹

Multi-Rack Fabric Capabilities

CXL 4.0 extends reach through two mechanisms:

Four retimers supported: Previous generations allowed two retimers. Four retimers enable longer physical connections spanning multiple racks without signal degradation.¹⁰

Native x2 width: Previously a degraded fallback mode, x2 links now operate at full performance. This enables higher fan-out configurations where many lower-bandwidth connections serve more endpoints.¹¹

These features combine to enable "multi-rack memory pooling"—a capability the CXL Consortium explicitly targets for late 2026-2027 production deployment.¹²

CXL Use Cases for AI Infrastructure

KV Cache Offloading for LLM Inference

The highest-impact near-term use case: offloading KV cache from GPU VRAM to CXL-attached memory.

The problem: LLM inference with long contexts generates massive KV caches. A 70B parameter model with 128K context and batch size 32 can require 150+ GB just for KV cache.¹³ This exceeds H100 VRAM, forcing expensive batch size reductions or multiple GPUs.

The CXL solution: Store KV cache in pooled CXL memory while keeping hot layers in GPU VRAM. XConn and MemVerge demonstrated this at SC25 and OCP 2025:¹⁴

Two H100 GPUs (80GB each) running OPT-6.7B
KV cache offloaded to shared CXL memory pool
3.8x speedup vs 200G RDMA
6.5x speedup vs 100G RDMA
>5x improvement vs SSD-based KV cache

Research from academia confirms the opportunity. PNM-KV (Processing-Near-Memory for KV cache) achieves up to 21.9x throughput improvement by offloading token page selection to accelerators within CXL memory.¹⁵

Memory Expansion for Training

Training workloads benefit from expanded memory capacity for:

Larger batch sizes: More samples per iteration without gradient accumulation
Activation checkpointing reduction: Store more activations in memory vs recomputation
Optimizer state: Adam optimizer requires 2x parameters for momentum/variance

CXL memory expansion enables training configurations previously requiring multi-node distribution to run on single nodes, reducing communication overhead.

Scientific and HPC Workloads

PNNL's Crete project uses CXL pools for high-throughput memory sharing across compute nodes in scientific simulations.¹⁶ Use cases include:

Molecular dynamics with large neighbor lists
Graph analytics on trillion-edge datasets
In-memory databases exceeding single-server capacity

The Interconnect Landscape

CXL vs NVLink vs UALink

Understanding where CXL fits requires recognizing that these technologies serve different purposes:

Standard	Primary Purpose	Best For
CXL	Memory coherency + pooling	CPU-memory expansion, shared memory pools
NVLink	GPU-to-GPU scaling	Within-node GPU communication
UALink	Accelerator interconnect	Open standard alternative to NVLink
Ultra Ethernet	Scale-out networking	Multi-rack, 10,000+ endpoints

CXL runs on PCIe SerDes: lower error rate, lower latency, but lower bandwidth than NVLink/UALink's Ethernet-style SerDes.¹⁷ NVLink 5 delivers 1.8 TB/s per GPU—far exceeding CXL 4.0's 512 GB/s per x16 port.¹⁸

The technologies complement rather than compete:

Within a GPU node: NVLink connects GPUs
Between nodes: UALink or InfiniBand/Ethernet
Memory expansion: CXL adds capacity to CPUs and accelerators
Fabric-wide memory pools: CXL switches enable sharing across hosts

Panmnesia proposes "CXL-over-XLink" architectures integrating all three, reporting 5.3x faster AI training and 6x inference latency reduction vs PCIe/RDMA baselines.¹⁹

Decision Framework: When to Use What

Scenario	Recommended Interconnect	Rationale
Multi-GPU training within server	NVLink	Highest bandwidth, lowest latency
Multi-GPU inference pod (non-NVIDIA)	UALink	Open standard, high bandwidth
Expand memory beyond VRAM	CXL	Cache coherency, DRAM-like latency
Multi-rack GPU cluster	InfiniBand or Ultra Ethernet	Designed for scale-out
Shared memory pool across servers	CXL switches	Memory pooling with coherency
China/restricted markets	Consider UB-Mesh	Avoids Western IP dependencies

CXL Ecosystem: Vendors and Products

Memory Expanders

The three major DRAM manufacturers all ship CXL memory expanders:

Vendor	Product	Capacity	Interface	Status
Samsung	CMM-D	256 GB	CXL 2.0	Mass production 2025²⁰
SK Hynix	CMM-DDR5	128 GB	CXL 2.0	Mass production late 2024²¹
Micron	CZ120	256 GB	CXL 2.0	Sampling²²
SK Hynix	CMS	512 GB	CXL (compute-enabled)	Announced²³

SK Hynix's CMS (Computational Memory Solution) adds compute capabilities directly in the memory module—an early implementation of processing-near-memory for CXL.

Switch Vendors

CXL switches enable memory pooling across multiple hosts:

Vendor	Product	Generation	Status	Key Feature
XConn	XC50256	CXL 2.0	Shipping	256-lane switch, first to market²⁴
XConn	Apollo	CXL 2.0	Shipping	Memory pooling demonstrations at SC25²⁵
Panmnesia	Fabric Switch	CXL 3.2	Sampling Nov 2025	First PBR implementation²⁶
Astera Labs	Leo	CXL 2.0	Shipping	Smart memory controller²⁷
Microchip	SMC 2000	CXL 2.0	Shipping	Memory expansion controller²⁸

Panmnesia's CXL 3.2 Fabric Switch represents a generation leap: first silicon implementing port-based routing for true fabric architectures with up to 4,096 nodes.²⁹

Controller Vendors

CXL memory controllers translate between CXL protocol and DRAM:

Vendor	Role	Key Products
Marvell	Controller	Structera CXL controllers³⁰
Montage	Controller	CXL memory buffer chips
Astera Labs	Controller	Leo smart memory controller
Microchip	Controller	SMC 2000 series

Marvell's Structera completed interoperability testing with all three major memory suppliers (Samsung, Micron, SK Hynix) on both Intel and AMD platforms.³¹

Deployment Planning Guide

Timeline

Period	CXL Generation	Expected Capability	Recommendation
Now-Q2 2026	CXL 2.0	Memory expansion, basic pooling	Production evaluation
Q3 2026-Q4 2026	CXL 3.0/3.1	Fabric, peer-to-peer, 4K nodes	Early adoption for AI
2027+	CXL 4.0	Multi-rack pooling, 1.5 TB/s	Planning begins now

ABI Research expects CXL 3.0/3.1 solutions with sufficient software support for commercial adoption by 2027.³²

What to Evaluate Now

Immediate (2025): 1. Test CXL 2.0 memory expanders on existing Intel Sapphire Rapids or AMD EPYC Genoa servers 2. Evaluate XConn or Astera Labs switches for memory pooling proofs-of-concept 3. Benchmark KV cache offloading with MemVerge GISMO technology³³

2026 planning: 1. Assess Panmnesia CXL 3.2 switch samples when available 2. Plan for CXL-enabled GPU servers (NVIDIA Blackwell supports CXL)³⁴ 3. Develop software stack for memory tiering between GPU VRAM and CXL memory

Infrastructure Readiness Checklist

Before deploying CXL:

[ ] CPU platform supports CXL (Intel 4th Gen+ or AMD EPYC 4th Gen+)
[ ] PCIe slots available for CXL devices
[ ] Operating system with CXL support (Linux 6.0+ for basic, 6.8+ for hotplug)³⁵
[ ] Application software can tier memory (NUMA-aware or explicit APIs)
[ ] Monitoring for CXL RAS events and memory errors

Software Stack Considerations

CXL memory appears as additional NUMA nodes to the operating system. Applications must be CXL-aware or NUMA-aware to benefit:

Software Layer	Requirement	Solutions
OS kernel	CXL driver support	Linux 6.0+, Windows Server 2025
Memory allocator	NUMA-aware allocation	numactl, hwloc, libmemkind
Application	Memory tiering	Explicit placement or transparent tiering
AI framework	KV cache offloading	vLLM CXL support (in development)³⁶

MemVerge's Memory Machine provides transparent memory tiering for applications without CXL-specific code paths.³⁷

Cost Considerations

CXL memory costs less per GB than GPU VRAM:

Memory Type	Approximate Cost/GB	Latency
HBM3e (in GPU)	$15-25	~10 ns
DDR5 DIMM	$3-5	~80 ns
CXL-attached DDR5	$4-7	200-500 ns
NVMe SSD	$0.10-0.20	~100 μs

For KV cache that tolerates 200-500 ns latency, CXL offers 4-5x cost reduction vs keeping data in GPU VRAM while delivering 200-500x lower latency than SSD offloading.³⁸

Migration Strategy

Phase 1: Memory expansion (Now) - Deploy CXL memory expanders for capacity - No application changes required - OS treats CXL memory as slow NUMA node

Phase 2: Memory tiering (2025-2026) - Implement tiering between fast local memory and CXL - Hot data in local DRAM, cold data in CXL memory - Requires memory management software

Phase 3: Memory pooling (2026-2027) - Deploy CXL switches for shared memory pools - Multiple hosts access common memory resources - Enables disaggregated memory architecture

Integration with Existing Infrastructure

NVIDIA GPU Systems

NVIDIA supports CXL on Blackwell architecture. The Grace CPU in Grace Hopper systems includes CXL support, enabling memory expansion beyond the 480GB unified memory.³⁹

For H100 systems, CXL memory expanders attach to the host CPU, not directly to GPUs. KV cache offloading requires CPU involvement to copy data between GPU VRAM and CXL memory.

AMD GPU Systems

AMD MI300X includes CXL support through its CPU chiplet. The 192GB unified memory can be supplemented with CXL-attached capacity.⁴⁰

Intel GPU Systems

Intel Data Center GPU Max (Ponte Vecchio) supports CXL for memory expansion on systems with Sapphire Rapids or later CPUs.⁴¹

Risks and Considerations

Standards Fragmentation

Four competing interconnect ecosystems (CXL/PCIe, UALink, Ultra Ethernet, NVLink) force infrastructure planners to make bets. Equipment purchased today may face interoperability challenges in 2027.

Mitigation: CXL's PCIe foundation ensures backward compatibility. CXL 4.0 devices will work with CXL 3.x, 2.0, 1.1, and 1.0 systems at reduced capability.⁴²

Software Maturity

CXL software support remains early-stage. Linux kernel support exists but many applications lack CXL-specific optimizations.

Mitigation: Use memory tiering software like MemVerge Memory Machine that provides transparent CXL integration.

Supply Chain

CXL 4.0 products won't reach volume production until 2027. CXL 3.0/3.1 availability depends on PCIe 6.0 ecosystem maturity.

Mitigation: Begin with CXL 2.0 products available today. Build operational experience before next-generation availability.

Key Takeaways

For infrastructure planners: - CXL 4.0 enables 100+ TB memory pools with cache coherency across racks - Bundled ports deliver 1.5 TB/s bandwidth per logical connection - Production deployment timeline: CXL 2.0 now, CXL 3.x late 2026, CXL 4.0 2027+ - Start evaluating CXL 2.0 memory expanders and switches immediately

For AI platform teams: - KV cache offloading to CXL memory delivers 3.8-6.5x speedup vs RDMA - CXL memory costs 4-5x less than GPU VRAM while delivering 200-500x lower latency than SSD - vLLM and other frameworks developing CXL-aware KV cache management - Test with XConn/MemVerge demonstrations available now

For strategic planning: - CXL complements NVLink/UALink; they serve different purposes - No single interconnect standard will "win"—plan for coexistence - Equipment decisions in 2025-2026 affect interoperability through 2030 - Chinese alternative (Huawei UB-Mesh) may create parallel ecosystem

For procurement: - CXL 2.0 products shipping from Samsung, SK Hynix, Micron - XConn and Astera Labs switches available for evaluation - Panmnesia CXL 3.2 fabric switch sampling November 2025 - Budget for CXL-enabled servers in 2026-2027 refresh cycles

For AI infrastructure deployment with CXL-enabled memory architecture, contact Introl.

References

Compute Express Link. "Overcoming the AI Memory Wall: How CXL Memory Pooling Powers the Next Leap in Scalable AI Computing." 2025. https://computeexpresslink.org/blog/overcoming-the-ai-memory-wall-how-cxl-memory-pooling-powers-the-next-leap-in-scalable-ai-computing-4267/ ↩
NVIDIA. "H200 and B200 Specifications." 2025. ↩
Synopsys. "How CXL and Memory Pooling Reduce HPC Latency." 2025. https://www.synopsys.com/blogs/chip-design/cxl-protocol-memory-pooling.html ↩
AMI Next Blog. "CXL Deep Dive: From Principles to Memory Pooling." 2025. https://www.aminext.blog/en/post/cxl-compute-express-link-deep-dive-1 ↩
CXL Consortium. "CXL 3.0 Specification." 2023. ↩
Blocks and Files. "CXL 4.0 doubles bandwidth and stretches memory pooling to multi-rack setups." November 24, 2025. https://blocksandfiles.com/2025/11/24/cxl-4/ ↩
NVIDIA. "H200 Tensor Core GPU Specifications." 2024. ↩
VideoCardz. "CXL 4.0 spec moves to PCIe 7.0, doubles bandwidth over CXL 3.0." November 2025. https://videocardz.com/newz/cxl-4-0-spec-moves-to-pcie-7-0-doubles-bandwidth-over-cxl-3-0 ↩
Synopsys. "CXL 4.0, Bandwidth First: What Designers Are Solving for Next." December 2025. https://www.synopsys.com/blogs/chip-design/cxl-4-bandwidth-first-what-designers-are-solving-next.html ↩
Datacenter News. "CXL 4.0 doubles bandwidth, introduces bundled ports for data centres." November 2025. https://datacenter.news/story/cxl-4-0-doubles-bandwidth-introduces-bundled-ports-for-data-centres ↩
CXL Consortium. "CXL 4.0 Webinar." December 4, 2025. https://computeexpresslink.org/wp-content/uploads/2025/12/CXL_4.0-Webinar_December-2025_FINAL.pdf ↩
Business Wire. "CXL Consortium Releases the Compute Express Link 4.0 Specification." November 18, 2025. https://www.businesswire.com/news/home/20251118275848/en/CXL-Consortium-Releases-the-Compute-Express-Link-4.0-Specification-Increasing-Speed-and-Bandwidth ↩
arXiv. "Scalable Processing-Near-Memory for 1M-Token LLM Inference." November 2025. https://arxiv.org/abs/2511.00321 ↩
PRWeb. "XConn Technologies and MemVerge Demonstrate CXL Memory Pool for KV Cache." October 2025. https://www.prweb.com/releases/xconn-technologies-and-memverge-demonstrate-cxl-memory-pool-for-kv-cache-using-nvidia-dynamo-for-breakthrough-ai-workload-performance-at-2025-ocp-global-summit-302581860.html ↩
arXiv. "PNM-KV: Scalable Processing-Near-Memory for 1M-Token LLM Inference." 2025. https://arxiv.org/html/2501.09020v1 ↩
CXL Consortium. "How CXL Transforms Server Memory Infrastructure." October 2025. https://computeexpresslink.org/wp-content/uploads/2025/10/CXL_Q3-2025-Webinar_FINAL.pdf ↩
Blocks and Files. "Panmnesia pushes unified memory and interconnect design for AI superclusters." July 2025. https://blocksandfiles.com/2025/07/18/panmnesia-cxl-over-xlink-ai-supercluster-architecture/ ↩
Next Platform. "UALink Fires First GPU Interconnect Salvo At Nvidia NVSwitch." April 2025. https://www.nextplatform.com/2025/04/08/ualink-fires-first-gpu-interconnect-salvo-at-nvidia-nvswitch/ ↩
Business News This Week. "Panmnesia Introduces AI Infrastructure with CXL over NVLink and UALink." July 2025. https://businessnewsthisweek.com/technology/panmnesia-introduces-todays-and-tomorrows-ai-infrastructure-including-a-supercluster-architecture-that-integrates-nvlink-ualink-and-hbm-via-cxl/ ↩
TrendForce. "SK hynix and Samsung Step up Focus on HBM4 and CXL." October 2024. https://www.trendforce.com/news/2024/10/24/news-sk-hynix-and-samsung-reportedly-step-up-focus-on-hbm4-and-cxl-amid-rising-chinese-competition/ ↩
ServeTheHome. "SK hynix CXL 2.0 Memory Expansion Modules Launched." 2024. https://www.servethehome.com/sk-hynix-cxl-2-0-memory-expansion-modules-launched-with-96gb-of-ddr5/ ↩
AnandTech. "CXL Gathers Momentum at FMS 2024." 2024. https://www.anandtech.com/show/21533/cxl-gathers-momentum-at-fms-2024 ↩
Tom's Hardware. "SK Hynix Unveils CXL Memory Module with Compute Capabilities." 2025. https://www.tomshardware.com/news/sk-hynix-unveils-cxl-computational-memory-solution ↩
EE Times. "XConn Shows Off First CXL Switch." 2024. https://www.eetimes.com/xconn-shows-off-first-cxl-switch/ ↩
CXL Consortium. "Supercomputing 2025 Demonstrations." November 2025. https://computeexpresslink.org/event/supercomputing-2025/ ↩
Business Wire. "Panmnesia Announces Sample Availability of PCIe 6.0/CXL 3.2 Fabric Switch." November 12, 2025. https://www.businesswire.com/news/home/20251112667725/en/Panmnesia-Announces-Sample-Availability-of-PCIe-6.0CXL-3.2-Fabric-Switch ↩
Storage Newsletter. "FMS 2025: Astera Labs and SMART Modular CXL Demo." August 2025. https://www.storagenewsletter.com/2025/08/06/fms-2025-h3-platform-debuts-cxl-memory-sharing-and-pooling-solution/ ↩
AnandTech. "Micron and Microchip CXL 2.0 Memory Module." 2024. ↩
TechPowerUp. "Panmnesia Samples Industry's First PCIe 6.0/CXL 3.2 Fabric Switch." November 2025. ↩
Yahoo Finance. "Marvell Extends CXL Ecosystem Leadership with Structera." September 2025. https://finance.yahoo.com/news/marvell-extends-cxl-ecosystem-leadership-130000255.html ↩
Marvell. "Structera CXL Memory-Expansion Controllers Interoperability Announcement." September 2025. ↩
GIGABYTE. "Revolutionizing the AI Factory: The Rise of CXL Memory Pooling." 2025. https://www.gigabyte.com/Article/revolutionizing-the-ai-factory-the-rise-of-cxl-memory-pooling ↩
Storage Newsletter. "XConn Technologies and MemVerge CXL Memory Solution for KV Cache." November 2025. https://www.storagenewsletter.com/2025/11/21/sc25-xconn-technologies-and-memverge-to-deliver-breakthrough-scalable-cxl-memory-solution-to-offload-kv-cache-and-prefill-decode-disaggregation-in-ai-inference-workloads/ ↩
NVIDIA. "Blackwell Architecture Technical Brief." 2024. ↩
Wikipedia. "Compute Express Link - Linux Support." https://en.wikipedia.org/wiki/Compute_Express_Link ↩
arXiv. "Exploring CXL-based KV Cache Storage for LLM Serving." 2024. https://mlforsystems.org/assets/papers/neurips2024/paper17.pdf ↩
MemVerge. "Memory Machine Software for CXL." 2025. ↩
Penguin Solutions. "Explaining CXL Memory 101: Expansion, Pooling, & Sharing." 2025. https://www.penguinsolutions.com/en-us/resources/blog/what-is-cxl-memory-expansion ↩
NVIDIA. "Grace Hopper Superchip Architecture." 2024. ↩
AMD. "Instinct MI300X Specifications." 2024. ↩
Intel. "Data Center GPU Max (Ponte Vecchio) Documentation." 2024. ↩
CXL Consortium. "CXL 4.0 Backward Compatibility Statement." November 2025. ↩
Servers Simply. "Next-Gen HPC & AI Infrastructure 2025: GPUs, CXL, Gen5 NVMe." 2025. https://www.serversimply.com/blog/next-gen-hpc-and-ai-infrastructure-in-2025 ↩
Semi Engineering. "CXL Thriving As Memory Link." 2025. https://semiengineering.com/cxl-thriving-as-memory-link/ ↩
HPCwire. "Everyone Except Nvidia Forms Ultra Accelerator Link Consortium." May 2024. https://www.hpcwire.com/2024/05/30/everyone-except-nvidia-forms-ultra-accelerator-link-ualink-consortium/ ↩
ServeTheHome. "Huawei Presents UB-Mesh Interconnect for Large AI SuperNodes at Hot Chips 2025." August 2025. ↩
arXiv. "Amplifying Effective CXL Memory Bandwidth for LLM Inference via Transparent Near-Data Processing." September 2025. https://arxiv.org/abs/2509.03377 ↩
CXL Consortium. "Advantages of CXL Memory Sharing for Emerging Applications." June 2025. https://computeexpresslink.org/wp-content/uploads/2025/06/CXL_Q2-2025-Webinar_FINAL.pdf ↩