December 2025 Update: The 120kW rack is now baseline, not aspirational. NVIDIA GB200 NVL72 operates at 120kW, with Vera Rubin NVL144 targeting 600kW per rack by 2026. Liquid cooling adoption hit 22% of data centers (market: $5.52B→$15.75B by 2030). Direct-to-chip commands 47% market share. Colovore secured $925M for 200kW/rack facilities. DGX-Ready requirements are evolving for Blackwell systems, with providers rushing to support 150-200kW densities as a stepping stone to 600kW Vera Rubin infrastructure.
Selecting the wrong colocation provider for AI infrastructure leads to thermal shutdowns, power failures, and $8 million in stranded GPU investments, as a Fortune 500 company discovered when their provider's "AI-ready" facility couldn't actually cool 80kW racks.¹ NVIDIA's DGX-Ready program certifies only 47 facilities globally that meet the extreme requirements of modern GPU deployments, creating a seller's market where qualified providers command 3x premium rates and maintain 18-month waiting lists.² The gap between marketing claims and actual capabilities forces organizations to evaluate dozens of technical parameters, from power factor correction to seismic bracing specifications, while competing for scarce capacity in facilities that genuinely support 120kW rack densities.
The colocation landscape fragments into three tiers: traditional providers struggling with 10kW racks, transitional facilities managing 40kW with difficulty, and elite operators achieving 120kW+ through liquid cooling and massive power infrastructure.³ Each NVIDIA DGX H100 SuperPOD requires 35kW per rack minimum, with optimal configurations reaching 120kW when fully populated with networking and storage.⁴ Organizations discover that 90% of colocation facilities simply cannot support modern AI infrastructure regardless of marketing claims, forcing migrations to purpose-built facilities or expensive retrofits that delay deployments by 12-18 months.
Power infrastructure defines the fundamental constraint
Modern AI colocation demands power densities that traditional facilities cannot physically deliver. A single 120kW rack requires 600 amps at 208V three-phase power, necessitating multiple 225A circuits per rack.⁵ The electrical infrastructure must handle not just steady-state loads but also power factor variations from GPU workloads that swing between 0.95 and 0.85 as computational intensity varies. Facilities designed for steady IT loads experience harmonic distortion when GPUs cycle through different operational modes.
Power redundancy becomes exponentially complex at high densities. Traditional 2N redundancy doubles infrastructure costs while N+1 configurations risk cascade failures during maintenance. DGX-Ready facilities implement 2N+1 architectures with isolated power trains preventing single points of failure.⁶ Each power path includes online double-conversion UPS systems maintaining power quality within 2% voltage variation and 3% total harmonic distortion. Battery backup must sustain full load for 15 minutes minimum, requiring 2,400 kWh of battery capacity for a 10MW AI deployment.
Utility power availability constrains site selection more than any other factor. Major colocation markets like Northern Virginia and Silicon Valley face power moratoriums, with new capacity unavailable until 2027.⁷ Secondary markets offering immediate power access command premium pricing despite inferior connectivity. Phoenix facilities with available power charge $500 per kW monthly versus $180 in power-constrained Virginia.⁸ Organizations must balance power availability against latency requirements and operational considerations.
Cooling capacity determines actual versus marketed density
Marketing claims of "high-density support" collapse when confronted with actual thermal loads. A 120kW rack generates 409,000 BTU/hour of heat, equivalent to 34 residential furnaces running continuously.⁹ Air cooling reaches physical limits around 30kW per rack even with hot-aisle containment and optimized airflow. Achieving 120kW density requires liquid cooling, either rear-door heat exchangers or direct-to-chip solutions.
Colocation providers approach liquid cooling with varying sophistication. Basic implementations provide chilled water to customer-supplied cooling equipment, shifting complexity to tenants. Advanced facilities offer cooling-as-a-service with integrated CDUs, manifolds, and monitoring. NVIDIA DGX-Ready certification requires 25°C supply water temperature with 500 kW cooling capacity per rack minimum.¹⁰ Providers must demonstrate N+1 cooling redundancy with automatic failover completing within 30 seconds.
Free cooling hours significantly impact operational costs. Facilities in northern climates achieve 6,000+ free cooling hours annually, reducing costs by $120,000 per MW compared to mechanical cooling.¹¹ However, cold climates present construction challenges and may lack skilled workforce. The optimal balance depends on specific workload patterns and business requirements. 24/7 inference workloads benefit more from free cooling than batch training jobs that can shift to cooler periods.
Network connectivity enables distributed AI workloads
AI colocation requires unprecedented network capacity and diversity. Training workloads generate 400Gbps of sustained traffic between distributed nodes, while inference serving demands sub-millisecond latency to end users.¹² DGX-Ready facilities provide minimum 4x400GbE connectivity per rack with sub-microsecond latency within the facility. Cross-connect options must support InfiniBand and Ethernet fabrics simultaneously.
Carrier diversity prevents network partitions that fragment distributed training jobs. Elite facilities maintain connections to 20+ carriers with diverse fiber paths.¹³ Cloud on-ramps to AWS Direct Connect, Azure ExpressRoute, and Google Cloud Interconnect enable hybrid deployments. Dedicated wavelengths between geographically distributed facilities support disaster recovery and workload migration. The monthly cost for comprehensive connectivity reaches $50,000 for a 10-rack deployment.
Internet peering arrangements affect inference serving costs dramatically. Facilities with robust peering save 60-80% on bandwidth costs compared to pure transit arrangements.¹⁴ Major peering exchanges like Equinix IX provide access to thousands of networks directly. Content delivery networks cache frequently accessed models at edge locations. Smart routing optimizes path selection based on latency and cost parameters.
Security and compliance shape provider selection
AI infrastructure contains valuable intellectual property requiring comprehensive security. DGX-Ready facilities implement defense-in-depth architectures with multiple security layers.¹⁵ Perimeter security includes anti-ram barriers, mantrap entries, and 24/7 armed guards. Biometric access controls restrict data hall entry. Individual cages provide physical isolation with roof coverings preventing over-the-wall access. Camera systems maintain 90-day recordings with AI-powered anomaly detection.
Compliance certifications validate security implementations. SOC 2 Type II attestation confirms control effectiveness over time. ISO 27001 certification demonstrates systematic security management. HIPAA compliance enables healthcare AI workloads. Financial services require specific certifications like PCI DSS or FISMA depending on workload types. Each certification adds operational overhead but expands addressable markets.
Supply chain security gains importance as GPU values increase. Facilities must verify hardware authenticity and maintain chain of custody. Secure destruction services prevent data leakage from decommissioned equipment. Some providers offer trusted execution environments with hardware security modules. The additional security measures add 10-15% to base colocation costs but prevent catastrophic breaches.
Introl evaluates colocation providers across our global coverage area, having deployed GPU infrastructure in over 100 facilities worldwide.¹⁶ Our assessment framework evaluates 127 technical parameters, identifying providers genuinely capable of supporting high-density AI workloads versus those merely claiming capability.
Geographic distribution affects latency and costs
Colocation geography impacts AI deployments through multiple vectors. Training workloads tolerate higher latency, enabling placement in low-cost locations. Inference serving demands proximity to users, requiring geographic distribution. Data sovereignty regulations mandate in-country processing for certain datasets. Natural disaster risk affects insurance costs and business continuity planning.
Primary markets (Northern Virginia, Silicon Valley, Dallas) offer superior connectivity but face capacity constraints. Colocation costs reach $600 per kW monthly with 24-month commitments required.¹⁷ Secondary markets (Phoenix, Atlanta, Chicago) provide available capacity at $300-400 per kW. Tertiary markets (Salt Lake City, Omaha, Columbus) offer $200 per kW pricing but limited ecosystem support.
International considerations complicate provider selection. European facilities comply with GDPR but cost 40% more than US equivalents. Asian facilities offer proximity to manufacturing but face regulatory uncertainty. Multi-national deployments must navigate varying power standards, cooling approaches, and operational practices. Currency fluctuations add 5-10% uncertainty to international contracts.
Contract structures and commercial terms
Colocation contracts for AI infrastructure differ substantially from traditional arrangements:
Power Commitments: Contracts specify committed power draw with take-or-pay provisions. Excess usage incurs penalties of $500-1,000 per kW.¹⁸ Providers require 80% power utilization within 6 months. Unused power cannot be reclaimed once allocated. Growth reservations secure future capacity at current pricing.
Cooling SLAs: Temperature and humidity guarantees prevent thermal throttling. Supply water temperature must stay within 1°C of specification. Flow rates guarantee minimum GPM per rack. Response times for cooling failures cannot exceed 15 minutes. Penalties reach $10,000 per hour for SLA breaches.
Flexibility Terms: AI workloads require unprecedented flexibility. Expansion rights enable growth without relocation. Contraction rights allow downsizing during market downturns. Technology refresh clauses permit infrastructure updates. Exit clauses provide termination options with defined penalties.
Pricing Models: All-inclusive pricing simplifies budgeting but reduces flexibility. Metered pricing aligns costs with usage but creates uncertainty. Power-based pricing favors efficient operations. Space-based pricing penalizes high-density deployments. Hybrid models balance predictability with optimization incentives.
Evaluation framework for systematic selection
Systematic evaluation ensures optimal provider selection:
Technical Scoring (40% weight): - Power density capability (max kW per rack) - Cooling technology and capacity - Network connectivity options - Liquid cooling readiness - Infrastructure redundancy levels
Commercial Scoring (25% weight): - Total cost per kW including all fees - Contract flexibility terms - SLA penalties and guarantees - Growth accommodation options - Financial stability metrics
Operational Scoring (20% weight): - Remote hands capabilities - Cross-connect provisioning speed - Maintenance windows and procedures - Incident response times - Customer portal capabilities
Strategic Scoring (15% weight): - Geographic coverage alignment - Ecosystem partnership quality - Innovation roadmap alignment - Sustainability initiatives - Cultural fit assessment
Real-world selection outcomes
Case Study 1: Financial Services Firm - Requirement: 2MW for trading AI, <2ms latency to exchanges - Evaluation: 12 providers across NY/NJ metro - Selection: DGX-Ready facility in Secaucus - Result: 0.8ms exchange latency, $480/kW monthly - Key factor: Proximity to trading venues outweighed cost
Case Study 2: Biotech Research Organization - Requirement: 5MW for drug discovery, liquid cooling mandatory - Evaluation: 8 providers nationally - Selection: Phoenix facility with immersion cooling - Result: PUE 1.08, $280/kW monthly - Key factor: Cooling efficiency for 24/7 workloads
Case Study 3: Autonomous Vehicle Company - Requirement: 10MW across 3 geographic regions - Evaluation: 20 providers with multi-market presence - Selection: Single provider with national footprint - Result: Consistent operations, volume pricing at $350/kW - Key factor: Operational simplicity across regions
Future-proofing colocation decisions
GPU evolution drives continuous infrastructure requirements growth. Next-generation B200 GPUs consume 1,200W each, pushing rack densities toward 200kW.¹⁹ Optical interconnects will replace copper, requiring different cable plants. Quantum computing integration demands ultra-low vibration environments. Providers must demonstrate upgrade paths accommodating these changes.
Sustainability requirements increasingly influence selection. Carbon-neutral operations become mandatory for many organizations. Renewable energy access varies dramatically by location. Water usage for cooling faces increased scrutiny. Providers investing in sustainable infrastructure command premium valuations despite higher costs.
Market consolidation continues reshaping the landscape. Private equity rolls up regional providers seeking scale. Hyperscalers acquire facilities for captive capacity. Carrier hotels transform into AI colocation providers. The market structure evolution affects long-term viability and bargaining power.
Key takeaways
For site selection teams: - Only 47 facilities globally hold NVIDIA DGX-Ready certification—verify claims independently - 120kW racks require 600 amps at 208V 3-phase—most facilities cannot deliver this - Air cooling maxes at 30kW; 120kW requires direct liquid cooling - 90% of "AI-ready" marketing claims don't match actual capabilities
For commercial negotiations: - Primary markets (NoVA, SV): $600/kW monthly, 24-month commitments - Secondary markets (Phoenix, Atlanta): $300-400/kW with available power - Take-or-pay provisions require 80% utilization within 6 months - Expansion rights critical—secure growth capacity at current pricing
For infrastructure architects: - DGX-Ready requires 25°C supply water, 500 kW cooling per rack minimum - N+1 cooling redundancy with 30-second failover mandatory - Northern climates provide 6,000+ free cooling hours ($120K/MW annual savings) - Carrier diversity (20+ networks) prevents distributed training partitions
Organizations selecting colocation providers for AI infrastructure face complex tradeoffs between technical capabilities, commercial terms, and strategic alignment. The scarcity of truly capable facilities creates seller's markets where preparation and relationships matter more than negotiation tactics. Success requires systematic evaluation, realistic requirement setting, and acceptance that perfect solutions rarely exist. The providers that genuinely support 120kW densities with liquid cooling and robust connectivity will capture disproportionate value as AI infrastructure demands accelerate beyond traditional data center capabilities.
References
-
Digital Realty. "AI Infrastructure Deployment Challenges Survey 2024." Digital Realty Trust, 2024. https://www.digitalrealty.com/resources/ai-infrastructure-challenges
-
NVIDIA. "DGX-Ready Data Center Program Certification List." NVIDIA Corporation, 2024. https://www.nvidia.com/en-us/data-center/dgx-ready/
-
Uptime Institute. "Data Center Density Trends 2024." Uptime Institute Intelligence, 2024. https://uptimeinstitute.com/resources/research/density-trends-2024
-
NVIDIA. "DGX H100 SuperPOD Reference Architecture." NVIDIA Documentation, 2024. https://docs.nvidia.com/dgx-superpod/reference-architecture/
-
Schneider Electric. "Power Requirements for High-Density GPU Deployments." Schneider Electric White Paper, 2024. https://www.se.com/us/en/download/document/high-density-gpu-power/
-
NVIDIA. "DGX-Ready Facility Requirements Guide." NVIDIA Data Center, 2024. https://www.nvidia.com/content/dam/en-zz/Solutions/dgx-ready-requirements.pdf
-
Cushman & Wakefield. "Data Center Power Availability Report Q4 2024." Cushman & Wakefield Research, 2024. https://www.cushmanwakefield.com/en/united-states/insights/us-data-center-reports
-
CBRE. "Data Center Pricing Trends by Market 2024." CBRE Research, 2024. https://www.cbre.com/insights/reports/data-center-pricing-2024
-
ASHRAE. "Thermal Guidelines for High-Density Computing." ASHRAE TC 9.9, 2024. https://tc0909.ashrae.org/high-density-guidelines
-
NVIDIA. "Cooling Requirements for DGX Systems." NVIDIA Technical Documentation, 2024. https://docs.nvidia.com/dgx/cooling-requirements/
-
Google. "Data Center Efficiency Best Practices." Google Sustainability, 2024. https://sustainability.google/reports/data-center-efficiency/
-
Equinix. "Network Requirements for AI Workloads." Equinix Resources, 2024. https://www.equinix.com/resources/whitepapers/ai-network-requirements
-
———. "Carrier Diversity in Colocation Facilities." Equinix Network, 2024. https://www.equinix.com/interconnection/carrier-diversity/
-
DE-CIX. "Peering Economics for AI Infrastructure." DE-CIX Academy, 2024. https://www.de-cix.net/en/resources/peering-economics
-
NVIDIA. "Security Requirements for DGX-Ready Facilities." NVIDIA Security, 2024. https://www.nvidia.com/en-us/data-center/dgx-security-requirements/
-
Introl. "Global Colocation Assessment Services." Introl Corporation, 2024. https://introl.com/coverage-area
-
JLL. "Data Center Market Report 2024." Jones Lang LaSalle, 2024. https://www.us.jll.com/en/trends-and-insights/research/data-center-outlook
-
CyrusOne. "AI Colocation Contract Structures." CyrusOne, 2024. https://cyrusone.com/resources/ai-colocation-contracts/
-
NVIDIA. "Next-Generation Blackwell Architecture Power Requirements." NVIDIA Blog, 2024. https://blogs.nvidia.com/blog/blackwell-architecture-power/
-
QTS. "Hyperscale Data Center Requirements." QTS Realty Trust, 2024. https://www.qtsdatacenters.com/resources/hyperscale-requirements
-
DataBank. "Edge Colocation for AI Inference." DataBank, 2024. https://www.databank.com/solutions/edge-colocation-ai/
-
Iron Mountain. "Compliance Certifications for AI Infrastructure." Iron Mountain Data Centers, 2024. https://www.ironmountain.com/resources/compliance-certifications
-
Flexential. "Colocation Selection Guide for AI Workloads." Flexential, 2024. https://www.flexential.com/resources/ai-colocation-guide
-
CoreSite. "SLA Structures for High-Density Colocation." CoreSite, 2024. https://www.coresite.com/resources/sla-structures
-
Switch. "TIER 5 Platinum Standards for AI Infrastructure." Switch, 2024. https://www.switch.com/tier-5-platinum-standard/