Ethernet Switches for AI: The 51.2Tbps Platforms Connecting GPU Clusters
Updated December 11, 2025
December 2025 Update: Ethernet now leading AI back-end deployments per Dell'Oro Group. xAI Colossus (100,000 H100s) achieving 95% throughput with Spectrum-X vs 60% on traditional Ethernet. Broadcom Tomahawk 5 delivering 51.2Tbps in single monolithic chip (64x 800GbE). Ultra Ethernet Consortium 560-page spec formalizing AI-optimized standards. NVIDIA Spectrum-X800 providing 1.6x AI performance over traditional Ethernet.
Ethernet now leads AI back-end network deployments. Dell'Oro Group reports that compelling cost advantages, multi-vendor ecosystems, and operational familiarity drive adoption over InfiniBand in 2025.¹ The shift gains momentum as xAI's Colossus supercomputer demonstrates Ethernet performance at massive scale, connecting 100,000 NVIDIA Hopper GPUs using Spectrum-X networking and achieving 95% data throughput with advanced congestion control.² Traditional Ethernet at similar scale suffers from thousands of flow collisions, limiting throughput to roughly 60%.³
Switch silicon has doubled bandwidth to meet AI demands. Broadcom's Tomahawk 5 delivers 51.2 terabits per second in a single monolithic chip, powering switches with 64 ports of 800GbE or 128 ports of 400GbE.⁴ NVIDIA's Spectrum-X800 platform matches this capacity while adding AI-specific optimizations through software integration with BlueField SuperNICs. The June 2025 Ultra Ethernet Consortium specification formalizes standards for AI-optimized Ethernet, establishing a 560-page framework for congestion control, RDMA transport, and multi-vendor interoperability.⁵
Broadcom Tomahawk 5 sets the bandwidth benchmark
The StrataXGS Tomahawk 5 switch series delivers 51.2 terabits per second of Ethernet switching capacity in a single monolithic device, doubling the bandwidth of previous generation silicon.⁶ The chip represents Broadcom's continued dominance in merchant switch silicon, maintaining the bandwidth doubling cadence established with Tomahawk 1 in 2014.
Architecture decisions differentiate Tomahawk 5 from competitors. While competing 51.2Tbps designs use chiplet architectures wrapping multiple signaling SerDes chiplets around monolithic packet processing engines, Tomahawk 5 achieves full bandwidth in a single piece of silicon using 5nm process technology.⁷ The shared-buffer architecture provides the highest performance and lowest tail latency for RoCEv2 and other RDMA protocols critical to AI workloads.⁸
Port configurations support diverse deployment scenarios: 64 ports at 800Gbps for spine deployments requiring maximum per-port bandwidth, 128 ports at 400Gbps for balanced leaf switches, and 256 ports at 200Gbps for environments requiring extensive server connectivity.⁹ The chip supports both traditional Clos topologies and non-Clos architectures including torus, Dragonfly, Dragonfly+, and Megafly configurations optimized for AI cluster communications.¹⁰
Advanced features target AI/ML workload requirements directly. Cognitive Routing provides intelligent traffic distribution. Dynamic load balancing spreads flows across available paths. End-to-end congestion control prevents the network saturation that degrades GPU utilization.¹¹ Broadcom claims Jericho3-AI offers more than 10% shorter job completion times versus competing chips through these optimizations.¹²
Power efficiency gains prove substantial. A single Tomahawk 5 replaces forty-eight Tomahawk 1 switches in equivalent bandwidth, resulting in over 95% reduction in power requirements.¹³ For AI data centers already struggling with per-rack power density, networking efficiency improvements compound with compute and cooling optimization.
Commercial switch products from multiple vendors leverage Tomahawk 5 silicon. FS.com's N9600-64OD delivers 64x 800GbE ports with sub-microsecond latency.¹⁴ NADDOD's N9500 series offers both 400G and 800G configurations optimized for AI data center deployments.¹⁵ Arista's 7060X6 AI Leaf family employs Tomahawk 5 for 51.2Tbps capacity in 2RU form factors.¹⁶
NVIDIA Spectrum-X builds AI-native Ethernet
NVIDIA designed Spectrum-X as the first Ethernet networking platform purpose-built for AI workloads. The platform combines Spectrum SN5600 switches with BlueField-3 SuperNICs, accelerating generative AI performance by 1.6x over traditional Ethernet implementations.¹⁷
The Spectrum-X800 SN5600 switch provides 64 ports of 800GbE using OSFP form factors and 51.2Tbps total switching capacity.¹⁸ The Spectrum-4 architecture underlying the switch exceeds previous generation capabilities in both capacity and port density. Integration with BlueField SuperNICs enables coordinated congestion control, adaptive routing, and telemetry collection spanning the entire network fabric.
Real-world deployments validate the architecture. xAI's Colossus cluster uses Spectrum-X Ethernet to train the Grok family of large language models across 100,000 GPUs.¹⁹ The system achieves 95% data throughput through congestion control technology specifically optimized for the bursty, synchronized communication patterns of distributed AI training.²⁰
2025 product announcements extend Spectrum-X capabilities significantly. Spectrum-X Photonics switches unveiled in March 2025 fuse electronic circuits with optical communications at massive scale.²¹ Configurations include 128 ports of 800Gbps (100Tbps total) and 512 ports of 800Gbps (400Tbps total), enabling AI factories connecting millions of GPUs while reducing energy consumption.²²
Spectrum-XGS Ethernet announced in August 2025 introduces scale-across technology combining distributed data centers into unified giga-scale AI super-factories.²³ The technology represents a third pillar of AI computing beyond traditional scale-up (NVLink) and scale-out (standard networking), enabling organizations to aggregate distributed infrastructure into coherent training environments.
Major cloud providers standardize on Spectrum-X. Meta and Oracle announced in October 2025 they will deploy Spectrum-X Ethernet switches as an open, accelerated networking architecture accelerating AI training efficiency.²⁴ The multi-vendor ecosystem positions Spectrum-X as both an NVIDIA solution and an industry platform.
Ultra Ethernet Consortium establishes AI-ready standards
The Ultra Ethernet Consortium released Specification 1.0 on June 11, 2025, establishing a comprehensive 560-page framework for AI and HPC networking.²⁵ The consortium, launched in 2023 under the Linux Foundation, unites over 50 technology companies including AMD, Intel, Broadcom, Cisco, Arista, Meta, Microsoft, Dell, Samsung, and Huawei.²⁶
Technical innovations address fundamental limitations in traditional Ethernet for AI workloads. The specification defines enhanced RDMA implementations, transport protocols, and congestion control mechanisms designed for the synchronized, bursty communication patterns of distributed training.²⁷
Congestion control approaches differ fundamentally from traditional RoCE implementations. The UEC approach does not rely on lossless networks as traditionally required, introducing a receiver-driven mode where endpoints can limit sender transmissions actively rather than remaining passive.²⁸ The shift enables construction of larger networks with better efficiency for AI workloads.
Performance targets span cluster-scale deployments. The specification aims for round-trip times between 1 and 20 microseconds across clusters, optimizing specifically for data center environments running AI training, inference, and HPC workloads.²⁹
Interoperability guarantees prevent vendor lock-in. UEC Specification 1.0 delivers high-performance solutions across NICs, switches, optics, and cables, enabling seamless multi-vendor integration.³⁰ The open standard allows organizations to source components from multiple suppliers while maintaining performance consistency.
Product availability follows specification release. Arista confirmed support for UEC 1.0 switching enhancements across the Etherlink product portfolio, starting with 7060X and 7800R platforms.³¹ Full-stack supporting hardware from multiple vendors ships by late 2025 or early 2026.³²
Arista and Cisco compete in modular AI platforms
Traditional networking vendors adapt data center platforms for AI workload requirements, competing against NVIDIA's purpose-built approach.
Arista's 7800R4 Series launched October 29, 2025 as the fourth generation of modular spine systems designed for AI deployments.³³ The platform delivers 460Tbps (920Tbps full duplex) system throughput across configurations from four to sixteen line card modules.³⁴ Port counts scale to 576x 800GbE or 1152x 400GbE for massive cluster connectivity.³⁵
The 7800R4 implements Broadcom Jericho3-AI processors with an AI-optimized packet pipeline.³⁶ HyperPort technology combines four 800Gbps ports into 3.2Tbps aggregate connections, enabling 44% shorter job completion times for AI bandwidth flows compared to traditional load balancing across separate ports.³⁷ Modular chassis and 7280R4 fixed-form switches ship now, with 7020R4 variants and HyperPort linecards arriving Q1 2026.³⁸
Cisco Silicon One unifies routing and switching capabilities with up to 51.2Tbps performance powered by the G200 ASIC.³⁹ The architecture targets both AI scale-out and scale-up networking with high capacity, ultra-low latency, and reduced job completion times.⁴⁰
Cisco 8800 Series modular routers provide the chassis foundation. Available in 4, 8, 12, and 18-slot configurations, all models support third-generation 36x 800G (P100) line cards based on Silicon One.⁴¹ The Cisco 8223 router delivers 51.2Tbps capacity using the Silicon One P200 programmable chip.⁴²
The expanded Cisco-NVIDIA partnership integrates Silicon One chips into the Spectrum-X Ethernet stack, combining low-latency switching, adaptive routing, and telemetry for GPU cluster support.⁴³ SONiC (Software for Open Networking in the Cloud) support on Cisco 8000 Series switches enables organizations to select open network operating systems matching operational requirements.⁴⁴
RoCE makes Ethernet competitive with InfiniBand
RDMA over Converged Ethernet (RoCE) enables Ethernet networks to match InfiniBand performance for AI workloads when properly configured. Meta published engineering details for their 24,000-GPU cluster, stating they tuned both RoCE and InfiniBand to provide equivalent performance, with the largest models trained on their RoCE fabric.⁴⁵
RoCE v2 relies on lossless Ethernet network configuration. Priority Flow Control eliminates packet loss for selected traffic classes. Enhanced Transmission Selection allocates bandwidth across traffic types. Explicit Congestion Notification signals early congestion. Dynamic Congestion Control optimizes RDMA performance.⁴⁶ Without proper configuration of these mechanisms, RoCE performance degrades significantly.
Major cloud platforms validate RoCE for production AI workloads. Google Cloud's A3 Ultra and A4 Compute Engine machine types leverage RoCEv2 for high-performance GPU networking.⁴⁷ Oracle's Zettascale10 supercluster uses the Acceleron RoCE network fabric with specialized Ethernet NICs containing integrated four-port switches to minimize latency.⁴⁸
Meta's AI cluster architecture demonstrates RoCE at scale. The backend fabric connects all RDMA NICs in a non-blocking topology providing high bandwidth, low latency, and lossless transport between any two GPUs.⁴⁹ A two-stage Clos topology organizes AI racks into zones, with rack training switches serving as leaf switches connecting GPUs via copper DAC cables.⁵⁰
Cost considerations favor Ethernet for many deployments. For tier 2 and tier 3 companies deploying 256-1,024 GPU clusters, Ethernet with RoCE represents the default recommendation unless specific, quantified latency requirements justify the 2x networking cost of InfiniBand.⁵¹ Published case studies of large-cluster performance on Ethernet (Meta's LLAMA2 and LLAMA3 training) show performance parity between Ethernet and InfiniBand.⁵²
Next-generation silicon approaches 102.4Tbps
AI data center bandwidth demands continue doubling. Tomahawk 6 development addresses clusters spanning 100,000 GPUs and beyond, with hyperscalers pushing for earliest possible deployment.⁵³
The market remains competitive across multiple architectures. Marvell's Teralynx switch chip and Nova electro-optics platform enable 51.2Tbps designs today.⁵⁴ Marvell demonstrated 400G/lane technology operating at 224 Gbaud, a critical step toward 3.2T optical interconnects and 204.8T switches.⁵⁵ The company's 2nm custom SRAM development boosts custom XPU and switch device performance with up to 6 gigabits of high-speed memory.⁵⁶
Silicon photonics integration accelerates. NVIDIA's Spectrum-X Photonics and emerging vendor solutions embed optical communications directly into switch silicon, eliminating the pluggable transceiver bottleneck for highest-density deployments.
Switch selection now directly impacts AI training efficiency. The choice between NVIDIA's integrated Spectrum-X platform, merchant silicon from Broadcom in multi-vendor switches, or Cisco/Arista enterprise platforms depends on existing infrastructure, operational expertise, and integration requirements with GPU cluster management.
Introl deploys networking infrastructure across 257 global locations, configuring AI cluster fabrics from hundreds to 100,000 GPUs. Network architecture decisions determine whether expensive GPU resources achieve full utilization or sit idle waiting for data.
The Ethernet inflection point
2025 marks an inflection point for AI networking. Ethernet transitions from a cost-optimized alternative to InfiniBand into a performance-competitive platform validated at the largest scales. Spectrum-X achievements at xAI's Colossus cluster and Meta's production training environments demonstrate that properly configured Ethernet matches InfiniBand throughput while maintaining multi-vendor flexibility.
Switch silicon reaches 51.2Tbps across all major vendors, with 102.4Tbps designs in development. Ultra Ethernet Consortium standards ensure interoperability and continued innovation from competing suppliers. AI-specific optimizations in congestion control, load balancing, and adaptive routing eliminate the performance gaps that previously limited Ethernet in demanding environments.
Organizations planning AI infrastructure investments should evaluate Ethernet networking as the default choice for new deployments. InfiniBand maintains advantages in specific scenarios requiring absolute minimum latency, but the cost differential and ecosystem breadth favor Ethernet for most enterprise and cloud deployments. The technology selection matters less than proper configuration. RoCE networks delivering equivalent performance to InfiniBand require careful attention to lossless configuration, congestion control, and fabric topology design that matches workload characteristics.
Key takeaways
For network architects: - Dell'Oro: Ethernet leads AI back-end deployments in 2025 on cost, multi-vendor ecosystem, and operational familiarity - xAI Colossus: 100,000 GPUs via Spectrum-X achieving 95% throughput; traditional Ethernet limited to ~60% from flow collisions - Ultra Ethernet Consortium Spec 1.0 (June 2025): 560-page framework for congestion control, RDMA, multi-vendor interoperability
For procurement teams: - Broadcom Tomahawk 5: 51.2Tbps monolithic, 64×800GbE or 128×400GbE; single chip replaces 48 Tomahawk 1 switches (95% power reduction) - NVIDIA Spectrum-X800 SN5600: 64×800GbE, 51.2Tbps, integrated with BlueField-3 SuperNICs; 1.6x performance over traditional Ethernet - Arista 7800R4: 460Tbps system, up to 576×800GbE; Cisco 8800 with Silicon One G200 at 51.2Tbps
For RoCE deployments: - Meta: RoCE and InfiniBand tuned to equivalent performance; largest models trained on RoCE fabric across 24,000 GPUs - Requirements: Priority Flow Control, Enhanced Transmission Selection, ECN, Dynamic Congestion Control—without proper configuration, performance degrades significantly - Cost consideration: Ethernet with RoCE recommended for 256-1,024 GPU clusters unless specific latency requirements justify 2x InfiniBand networking cost
For future planning: - Spectrum-X Photonics (March 2025): fused electronic/optical, 128×800Gbps (100Tbps) and 512×800Gbps (400Tbps) configurations - Tomahawk 6 in development for 100,000+ GPU clusters; Marvell demonstrated 400G/lane for 3.2T optical interconnects - Meta and Oracle standardizing on Spectrum-X as open, accelerated architecture for AI training efficiency
For vendor evaluation: - NVIDIA Spectrum-X: integrated platform with BlueField SuperNICs, AI-specific congestion control - Broadcom Tomahawk 5: merchant silicon in multi-vendor switches (Arista, FS.com, NADDOD) - Cisco/Arista enterprise: modular platforms, Silicon One integration, SONiC support for open NOS options
References
-
Dell'Oro Group, as cited in Vitex Technology, "InfiniBand vs Ethernet for AI Clusters in 2025," 2025.
-
NVIDIA Newsroom, "NVIDIA Spectrum-X Ethernet Switches Speed Up Networks for Meta and Oracle," October 13, 2025.
-
NADDOD Blog, "Spectrum-X: NVIDIA's Answer to AI Ethernet Challenges," 2025.
-
Broadcom, "BCM78900 | 51.2 Tb/s StrataXGS Tomahawk 5 Ethernet Switch," product page, 2025.
-
Ultra Ethernet Consortium, "UEC Launches Specification 1.0 Transforming Ethernet for AI and HPC at Scale," June 11, 2025.
-
Broadcom Investors, "Broadcom Ships Tomahawk 5, Industry's Highest Bandwidth Switch Chip to Accelerate AI/ML Workloads," press release.
-
TechInsights, "Tomahawk 5 Switches At 51.2Tbps," 2025.
-
Broadcom, "BCM78900 | 51.2 Tb/s StrataXGS Tomahawk 5 Ethernet Switch," product page, 2025.
-
NADDOD Blog, "NADDOD Launches 51.2T 800G and 400G Ethernet AI Data Center Switches Powered by Broadcom Tomahawk 5," 2025.
-
Broadcom, "BCM78900," product specifications, 2025.
-
Broadcom, "BCM78900," product specifications, 2025.
-
Network Computing, "AI Workloads Spur Competition in Networking Chips," 2025.
-
Microchip USA, "Broadcom Tomahawk 5 For Data Center Networking," 2025.
-
FS.com, "N9600-64OD, 64-Port Ethernet HPC/AI Data Center Switch," product page, 2025.
-
NADDOD Blog, "NADDOD Launches 51.2T 800G and 400G Ethernet AI Data Center Switches," 2025.
-
Arista, "7060X6 Series 800G Data Center Switches Data Sheet," 2025.
-
NVIDIA, "Spectrum-X | Ethernet Networking Platform for AI," product page, 2025.
-
AMAX, "NVIDIA Spectrum-X800 Ethernet Platform," product page, 2025.
-
Lightwave Online, "NVIDIA's 800G Ethernet switch powers the AI-based Colossal supercomputer," 2025.
-
NADDOD Blog, "Spectrum-X: NVIDIA's Answer to AI Ethernet Challenges," 2025.
-
NVIDIA Investor Relations, "NVIDIA Announces Spectrum-X Photonics, Co-Packaged Optics Networking Switches to Scale AI Factories to Millions of GPUs," March 18, 2025.
-
NVIDIA Investor Relations, "NVIDIA Announces Spectrum-X Photonics," March 18, 2025.
-
NVIDIA Investor Relations, "NVIDIA Introduces Spectrum-XGS Ethernet to Connect Distributed Data Centers Into Giga-Scale AI Super-Factories," August 22, 2025.
-
NVIDIA Investor Relations, "NVIDIA Spectrum-X Ethernet Switches Speed Up Networks for Meta and Oracle," October 13, 2025.
-
Ultra Ethernet Consortium, "UEC Launches Specification 1.0," June 11, 2025.
-
STORDIS GmbH, "Ultra Ethernet Consortium Explained: How UEC Is Redefining AI and HPC Networking," 2025.
-
Network World, "Ultra Ethernet Consortium publishes 1.0 specification, readies Ethernet for HPC, AI," June 2025.
-
HPCwire, "Ultra Ethernet Consortium Releases Specification," June 11, 2025.
-
STORDIS GmbH, "Ultra Ethernet Consortium Explained," 2025.
-
PR Newswire, "Ultra Ethernet Consortium (UEC) Launches Specification 1.0," June 11, 2025.
-
SemiAnalysis, "The New AI Networks | Ultra Ethernet UEC | UALink vs Broadcom Scale Up Ethernet SUE," June 11, 2025.
-
SemiAnalysis, "The New AI Networks," June 11, 2025.
-
Arista, "Arista Networks Unveils Next Generation Data and AI Centers," October 29, 2025.
-
Arista, "7800R4 Series AI Spine Switch Data Sheet," 2025.
-
Arista, "7800R4 Series AI Spine Switch Data Sheet," 2025.
-
The Next Platform, "Arista Modular Switches Aim At Scale Across Networks, Hit Scale Out, Too," November 4, 2025.
-
SDxCentral, "Arista unveils 800G platforms to power AI data center interconnects," 2025.
-
Network World, "Arista fills out AI networking portfolio," 2025.
-
Cisco, "Cisco Silicon One - Processors for Unified Network Architecture," product page, 2025.
-
Cisco, "Cisco Silicon One," product page, 2025.
-
Cisco, "Cisco 8800 Series Modular Routers Data Sheet," 2025.
-
Network World, "Cisco seriously amps-up Silicon One chip, router for AI data center connectivity," 2025.
-
Hyperframe Research, "Is Cisco Silicon One Ready to Power the AI Era?" August 1, 2025.
-
Cisco Blogs, "Craft your AI Data Center with Cisco 8000 and SONiC," 2025.
-
Meta Engineering, "RoCE networks for distributed AI training at scale," August 5, 2024.
-
LP Resources, "Unleash Your Network: A Deep Dive into RoCE (RDMA over Converged Ethernet)," 2025.
-
Google Cloud Blog, "RDMA RoCEv2 for AI workloads on Google Cloud," 2025.
-
Oracle, "OCI redefines top-level AI performance with Zettascale10," 2025.
-
Meta Engineering, "RoCE networks for distributed AI training at scale," August 2024.
-
Meta Engineering, "RoCE networks for distributed AI training at scale," August 2024.
-
Vitex Technology, "InfiniBand vs Ethernet for AI Clusters in 2025," 2025.
-
WWT, "The Battle of AI Networking: Ethernet vs InfiniBand," 2025.
-
The Next Platform, "The AI Datacenter Is Ravenous For 102.4 Tb/sec Ethernet Switch ASICs," June 3, 2025.
-
Network Computing, "AI Workloads Spur Competition in Networking Chips," 2025.
-
Futurum, "OFC 2025: Marvell Interconnecting the AI Era," 2025.
-
Marvell, "Marvell Develops Industry's First 2nm Custom SRAM for Next-Generation AI Infrastructure Silicon," 2025.
Squarespace Excerpt (159 characters): Ethernet leads AI networking in 2025. 51.2Tbps Tomahawk 5 and Spectrum-X power 100K GPU clusters. Ultra Ethernet Consortium sets standards for AI workloads.
SEO Title (58 characters): Ethernet Switches for AI: 51.2Tbps Platforms Lead in 2025
SEO Description (154 characters): Dell'Oro reports Ethernet leads AI networking. Tomahawk 5 delivers 51.2Tbps. Spectrum-X achieves 95% throughput at xAI's 100K GPU Colossus supercomputer.
Title Review: Current title "Ethernet Switches for AI: The 51.2Tbps Platforms Connecting GPU Clusters" effectively conveys the technical focus and AI relevance. At 67 characters, consider trimming to "Ethernet Switches for AI: 51.2Tbps Platforms for GPU Clusters" (59 chars) for full SERP display.
URL Slug Options: 1. ethernet-switches-ai-tomahawk-5-spectrum-x-51-2t-2025 (primary) 2. ai-networking-ethernet-roce-tomahawk-spectrum-x-2025 3. 51-2tbps-ethernet-switches-gpu-cluster-networking-2025 4. ultra-ethernet-consortium-ai-switches-roce-2025