Disaggregated Computing for AI: Composable Infrastructure Architecture

CXL memory pooling achieves 3.8x speedup compared to 200G RDMA and 6.5x speedup compared to 100G RDMA when sharing memory across GPU servers running large language model inference. The

Blake Crosley

Jan 06, 2026 9 min read Disclaimer

Disaggregated Computing for AI: Composable Infrastructure Architecture

December 2025 Update: CXL memory pooling achieving 3.8x speedup vs 200G RDMA, 6.5x vs 100G RDMA for LLM inference. Jensen Huang: "When you can put accelerators anywhere in a datacenter and compose and reconfigure for specific workloads—that's a revolution." Composable infrastructure breaking fixed server ratios to dynamically match exact AI workload requirements.

CXL memory pooling achieves 3.8x speedup compared to 200G RDMA and 6.5x speedup compared to 100G RDMA when sharing memory across GPU servers running large language model inference.¹ The demonstration used two servers with NVIDIA H100 GPUs running the OPT-6.7B model, showing how shared CXL memory accelerates AI workloads beyond what traditional networking enables. As NVIDIA's Jensen Huang noted: "When you're able to disaggregate the converged server, when you can put accelerators anywhere in a datacenter and then can compose and reconfigure that datacenter for this specific workload—that's a revolution."²

Composable infrastructure represents an architectural approach where compute, storage, and networking resources exist as abstracted pools managed independently through software-defined control planes.³ Unlike traditional architectures coupling CPU, memory, storage, and networking to specific servers, composable infrastructure treats hardware resources as flexible pools dynamically allocated across workloads. The approach promises dramatic improvements in resource utilization and deployment flexibility for AI infrastructure.

Breaking the server boundary

Traditional servers package fixed ratios of CPU, memory, GPU, and storage. AI workloads rarely match these fixed ratios. Training jobs demand maximum GPU density with relatively modest CPU requirements. Inference workloads may need more memory per GPU than standard configurations provide. Preprocessing pipelines require CPU and storage capacity without GPUs.

Composable infrastructure breaks the server boundary, allowing organizations to assemble virtual systems matching exact workload requirements.⁴ A training workload receives a composition of 8 GPUs, minimal CPU, and high-bandwidth storage. An inference workload receives 2 GPUs with expanded memory. The same physical resources serve both workloads at different times without hardware reconfiguration.

The disaggregation model

Disaggregated architectures separate physical nodes into dedicated resource types: compute nodes, memory nodes, GPU nodes, and storage nodes.⁵ High-speed fabrics connect the nodes, enabling software to compose logical systems from distributed physical resources. The composition happens in software without physical recabling.

Resources no longer sit idle waiting for specific workloads. A GPU node serves training jobs during peak hours and inference jobs overnight. Memory nodes expand capacity for memory-intensive workloads without over-provisioning every server. The flexibility improves utilization while reducing total hardware requirements.

CXL enables memory pooling

Compute Express Link (CXL) provides the cache-coherent interconnect enabling practical memory disaggregation.⁶ CXL offers memory-semantic access with latency in the 200-500 nanosecond range, compared to approximately 100 microseconds for NVMe and over 10 milliseconds for storage-based memory sharing.⁷ The latency improvement enables truly dynamic, fine-grained memory sharing across compute nodes.

How CXL memory pooling works

CXL memory pools create a new tier of high-speed, disaggregated memory reshaping how organizations build AI infrastructure.⁸ CPU nodes access pooled memory as if locally attached, with the CXL fabric handling coherency and data movement transparently. Applications see expanded memory capacity without modification.

The CXL Memory Box enables memory pooling across multiple GPU servers, allowing access to larger memory pools than individual servers provide.⁹ AI workloads processing datasets exceeding local memory capacity benefit from pooled memory without performance penalties of traditional remote memory access. The approach enables larger batch sizes and longer context windows without upgrading individual servers.

Beyond memory: full resource pooling

CXL enables more than memory pooling. The standard supports composable connections between CPUs, memory buffers, and accelerators.¹⁰ GPUs, FPGAs, DPUs, and other accelerators connect through CXL fabric for dynamic allocation across workloads.

The vision extends to complete resource disaggregation where no resource binds permanently to any other. Organizations build resource pools sized for aggregate demand rather than peak per-workload demand. Software orchestration composes appropriate resources for each workload in real-time.

Industry solutions

Several vendors offer composable infrastructure solutions addressing AI workload requirements.

Liqid composable platform

Liqid released composable GPU servers with CXL 2.0 memory pooling supporting up to 100 TB of disaggregated composable memory.¹¹ The platform includes the EX-5410P 10-slot GPU box supporting 600W GPUs including NVIDIA H200, RTX Pro 6000, and Intel Gaudi 3 accelerators. Matrix software orchestrates resource composition across the hardware platform.

The Liqid approach packages composability into integrated solutions rather than requiring customers to architect disaggregated systems from components. Organizations gain composability benefits without building expertise in fabric design and orchestration software development.

IBM Research composable systems

IBM Research explores CXL standards for building fully composable systems via high-speed, low-latency fabric.¹² In their architecture, resources exist as part of large pools connected through network fabric rather than statically grouped in servers. Composable resources group together to recreate server abstractions matching specific workload requirements.

The research program addresses challenges including fabric topology design, latency optimization, and software orchestration for composable AI infrastructure. The work advances understanding of how production-scale composable systems should operate.

GigaIO and Microchip collaboration

GigaIO and Microchip developed cloud-class composable disaggregated infrastructure combining PCIe and CXL technologies.¹³ The approach targets data centers requiring the flexibility of composable resources with the performance characteristics of direct-attached hardware.

Architectural considerations

Implementing composable infrastructure requires architectural decisions spanning fabric design, orchestration software, and workload management.

Fabric topology

The interconnect fabric determines achievable latency and bandwidth between disaggregated resources. CXL fabrics must provide sufficient bandwidth for memory-speed access patterns while maintaining latency within acceptable bounds. Fabric topology affects both performance and cost.

Switch-based topologies offer flexibility but add latency compared to direct connections. The tradeoff between topology complexity and latency budget depends on specific workload requirements. Memory-intensive workloads demand lower latency than storage-intensive workloads.

Orchestration requirements

Software orchestration manages resource composition, handling allocation requests, tracking resource state, and maintaining isolation between compositions. The orchestration layer must respond quickly enough to support dynamic workload changes without becoming a bottleneck.

Kubernetes integration enables composable resources to serve containerized AI workloads using familiar orchestration primitives. The GPU Operator and similar extensions manage accelerator resources, with composability extensions enabling dynamic GPU pool allocation.

Failure domain considerations

Disaggregation changes failure domain characteristics. A failed memory node affects all compositions using that memory rather than a single server. The blast radius of component failures expands compared to converged server architectures.

Redundancy strategies must account for disaggregated failure modes. Memory pools require redundancy across physical nodes. Composition policies should avoid concentrating critical workloads on shared resources. Monitoring must track health across the fabric rather than individual servers.

Infrastructure deployment expertise

Composable infrastructure complexity exceeds traditional server deployment. Fabric installation, performance validation, and orchestration configuration require specialized expertise that most organizations lack internally.

Introl's network of 550 field engineers support organizations implementing advanced infrastructure architectures including composable and disaggregated systems.¹⁴ The company ranked #14 on the 2025 Inc. 5000 with 9,594% three-year growth, reflecting demand for professional infrastructure services.¹⁵ Composable deployments benefit from experience with high-speed fabric installation and validation.

Deploying infrastructure across 257 global locations requires consistent practices regardless of geography.¹⁶ Introl manages deployments reaching 100,000 GPUs with over 40,000 miles of fiber optic network infrastructure, providing operational scale for organizations building composable AI infrastructure.¹⁷

The composable future

Disaggregated, resource-sharing architectures will enable infrastructure for processing the petabytes of data necessary for AI, machine learning, and other data-intensive technologies.¹⁸ CXL adoption will accelerate as the standard matures and vendor solutions proliferate.

Organizations planning AI infrastructure investments should evaluate composable architectures for deployments where workload variability makes fixed-ratio servers inefficient. The flexibility benefits compound with scale: larger deployments achieve better utilization improvements from resource pooling.

The transition from converged to composable infrastructure represents a fundamental shift in data center architecture. Organizations that master composable deployment gain flexibility advantages that translate to cost efficiency and deployment agility. The revolution Jensen Huang described begins with understanding how disaggregation changes infrastructure economics.

Key takeaways

For infrastructure architects: - CXL memory pooling achieves 3.8x speedup vs 200G RDMA and 6.5x vs 100G RDMA for LLM inference workloads - CXL latency: 200-500ns memory-semantic access vs ~100μs NVMe vs >10ms storage-based sharing - Disaggregation enables: 8 GPU composition for training, 2 GPU + expanded memory for inference, from same hardware pool

For procurement teams: - Liqid EX-5410P: 10-slot GPU box supporting 600W GPUs (H200, RTX Pro 6000, Gaudi 3) with 100TB CXL memory pooling - Traditional fixed-ratio servers waste resources: training needs max GPU with modest CPU; inference needs more memory per GPU - Composable reduces total hardware by pooling resources across workloads; GPU nodes serve training by day, inference by night

For platform engineers: - IBM Research exploring CXL for fully composable systems via high-speed, low-latency fabric - GigaIO/Microchip collaboration: cloud-class composable combining PCIe and CXL technologies - Kubernetes integration through GPU Operator extensions enables composable resources with familiar orchestration

For operations teams: - Failure domain changes: failed memory node affects all compositions using it vs single server in converged architecture - Redundancy strategies must account for disaggregated failure modes; avoid concentrating workloads on shared resources - Fabric health monitoring replaces individual server monitoring; composition policies prevent critical workload concentration

For strategic planning: - Jensen Huang: "When you're able to disaggregate the converged server...and reconfigure that datacenter for this specific workload—that's a revolution" - CXL Memory Box enables memory sharing across GPU servers for datasets exceeding local capacity - Vision: complete resource disaggregation where no resource binds permanently; software orchestration composes in real-time

References

H3 Platform. "Exploring the Future of AI with CXL Memory Sharing." 2024. https://www.h3platform.com/blog-detail/68 ↩
GigaIO. "The Future of Composability with CXL." 2024. https://gigaio.com/project/the-future-of-composability-with-cxl/ ↩
Medium. "Composable Infrastructure: Disaggregating GPUs, SSDs, and CPUs." ServerWala. 2024. https://medium.com/@serverwalainfra/composable-infrastructure-disaggregating-gpus-ssds-and-cpus-2b68de1b0a56 ↩
Medium. "Composable Infrastructure." 2024. ↩
Medium. "Composable Infrastructure." 2024. ↩
Compute Express Link. "Overcoming the AI Memory Wall: How CXL Memory Pooling Powers the Next Leap in Scalable AI Computing." CXL Blog. 2024. https://computeexpresslink.org/blog/overcoming-the-ai-memory-wall-how-cxl-memory-pooling-powers-the-next-leap-in-scalable-ai-computing-4267/ ↩
Compute Express Link. "Overcoming the AI Memory Wall." 2024. ↩
Compute Express Link. "Overcoming the AI Memory Wall." 2024. ↩
H3 Platform. "Exploring the Future of AI with CXL Memory Sharing." 2024. ↩
Keysight. "CXL 3.0 and the Future of AI Data Centers." 2024. https://www.keysight.com/blogs/en/inds/ai/cxl-3-0-and-the-future-of-ai-data-centers ↩
Blocks and Files. "Liqid unveils composable GPU servers with CXL 2.0 memory pooling." July 17, 2025. https://blocksandfiles.com/2025/07/17/liqid-pcie-gen-5-cxl-composability/ ↩
IBM Research. "Composable Disaggregated Infrastructure." IBM Research Projects. 2024. https://research.ibm.com/projects/composable-disaggregated-infrastructure ↩
GigaIO. "Microchip and GigaIO on Cloud-Class Composable Infrastructure." 2024. https://gigaio.com/project/microchip-and-gigaio-on-cloud-class-composable-infrastructure/ ↩
Introl. "Company Overview." Introl. 2025. https://introl.com ↩
Inc. "Inc. 5000 2025." Inc. Magazine. 2025. ↩
Introl. "Coverage Area." Introl. 2025. https://introl.com/coverage-area ↩
Introl. "Company Overview." 2025. ↩
Keysight. "CXL 3.0 and the Future of AI Data Centers." 2024. ↩
ACM. "Proceedings of the 4th Workshop on Heterogeneous Composable and Disaggregated Systems." 2024. https://dl.acm.org/doi/proceedings/10.1145/3723851 ↩
Liqid. "Composable Infrastructure Software Platform." 2025. https://www.liqid.com/ ↩