Multi-tenant GPU security: isolation strategies for shared infrastructure

Ninety percent of organizations deploy AI systems, yet only 5% feel confident in their security readiness.¹ Organizations with AI-specific security automation achieve $1.9 million in savings per

Blake Crosley

Mar 10, 2026 13 min read Disclaimer

Multi-tenant GPU security: isolation strategies for shared infrastructure

December 2025 Update: 90% of organizations deploying AI, only 5% feeling confident in security readiness. 97% of breached organizations lacking proper AI access controls. NVIDIA disclosing seven security vulnerabilities (January 27, 2025) including CVE-2025-23266 allowing root access via Container Toolkit bypass. US AI infrastructure security market reaching $2.99B (22.8% CAGR).

Ninety percent of organizations deploy AI systems, yet only 5% feel confident in their security readiness.¹ Organizations with AI-specific security automation achieve $1.9 million in savings per breach and reduce incident lifecycles by 80 days.² Meanwhile, 97% of breached organizations lacked proper AI access controls.³ As GPU infrastructure becomes the foundation of enterprise AI, the security model for shared GPU resources determines whether organizations can safely consolidate workloads or must maintain expensive dedicated hardware for every tenant.

The challenge extends beyond traditional virtualization security. GPUs handle sensitive data including model weights, training data, and inference inputs that represent organizational intellectual property. A breach at the GPU level could compromise the "brain" of an AI system.⁴ Multi-tenant GPU environments introduce attack surfaces that differ fundamentally from CPU-based virtualization, requiring security strategies designed specifically for GPU architectures.

The multi-tenant GPU security landscape

On January 27, 2025, NVIDIA disclosed seven new security vulnerabilities affecting GPU display drivers and virtual GPU software.⁵ These critical flaws impact millions of systems from enterprise AI infrastructure to cloud computing platforms. The NVIDIA Container Toolkit vulnerability CVE-2025-23266 allowed malicious actors to bypass isolation mechanisms and gain root access to host systems.⁶ The disclosure highlighted systemic weaknesses in GPU software stacks that organizations cannot ignore.

The US AI infrastructure security market reached $2.99 billion and expands at a 22.8% compound annual growth rate.⁷ AI-powered attacks accounted for 16% of all breaches in 2025.⁸ The investment reflects growing recognition that GPU infrastructure requires dedicated security attention beyond general data center protections.

GPU security differs from CPU security in fundamental ways. GPUs temporarily handle incredibly sensitive data during processing. Unlike CPUs, GPUs do not always have robust memory isolation, especially in multi-tenant environments.⁹ If memory clears improperly when a process ends, an attacker could retrieve leftover data from another user's workload.¹⁰ The shared architecture of modern GPUs enables contention-based side channels through which attackers can infer sensitive information, disrupt co-located workloads, or establish covert communication channels.¹¹

Hardware isolation with Multi-Instance GPU

NVIDIA's Multi-Instance GPU technology provides hardware-level isolation that enables secure multi-tenancy on high-value GPU hardware.¹² Starting with the Ampere architecture, MIG allows partitioning a single GPU into up to seven separate instances for CUDA applications.¹³ Blackwell and Hopper GPUs extend MIG capabilities with multi-tenant, multi-user configurations in virtualized environments, securing each instance with confidential computing at the hardware and hypervisor level.¹⁴

The architecture provides genuine hardware separation. Each MIG partition's processors have separate and isolated paths through the entire memory system.¹⁵ The on-chip crossbar ports, L2 cache banks, memory controllers, and DRAM address buses receive unique assignment to individual instances.¹⁶ One tenant cannot read or overwrite another tenant's GPU memory. Fault isolation prevents one user's crashed code from affecting the whole GPU or impacting others.¹⁷

MIG supports Linux operating systems, containerized workloads using Docker Engine, orchestration with Kubernetes, and virtualized environments through hypervisors including Red Hat Virtualization and VMware vSphere.¹⁸ The broad platform support enables organizations to implement GPU isolation within existing infrastructure without wholesale architecture changes.

The limitation of MIG lies in granularity. A 7-way partition represents the maximum subdivision on current hardware. Organizations requiring finer-grained sharing or supporting older GPU generations must consider alternative approaches.

vGPU and time-slicing alternatives

NVIDIA virtual GPU software enables multiple virtual machines with full input-output memory management unit protection to access a single physical GPU simultaneously.¹⁹ Beyond security, vGPU enables VM management with live migration and the ability to run mixed VDI and compute workloads.²⁰ The hypervisor virtualizes the GPU and assigns slices to multiple VMs, with each VM perceiving a virtualized portion of the GPU for its workloads.

Time-slicing provides a different sharing model. A system administrator defines a set of replicas for a GPU, each of which can be handed out independently to a pod running workloads in Kubernetes.²¹ Unlike MIG, time-slicing does not provide memory or fault isolation between replicas.²² If one task crashes or misbehaves, it can affect others sharing the GPU.²³ The tradeoff favors access over isolation: time-slicing enables sharing by larger numbers of users and provides access for older GPU generations that do not support MIG.²⁴

The security implications require clear understanding. Time-slicing works for development environments, testing, and workloads where tenants trust each other or where data sensitivity does not warrant hardware isolation. Production deployments with multi-tenant security requirements should prefer MIG or dedicated GPUs over time-slicing.

Hybrid approaches combine both technologies. Organizations can partition a GPU into MIG instances that ensure group isolation, then run time-slicing schedulers within each instance.²⁵ In Kubernetes clusters, allocating a MIG slice per namespace and time-sharing jobs within each slice balances security with cost efficiency.²⁶

Confidential computing on GPUs

The NVIDIA H100 Tensor Core GPU introduced confidential computing to GPUs, using a hardware-based trusted execution environment anchored in an on-die hardware root of trust.²⁷ Prior to the H100, confidential computing features existed only in CPUs from AMD and Intel.²⁸ The H100 provides data protection for AI training and inference workloads involving sensitive information.²⁹

The technical architecture builds on CPU confidential virtual machine capabilities. The GPU solution relies on a confidential VM trusted execution environment enabled by AMD SEV-SNP or Intel TDX on the CPU.³⁰ The PCIe firewall blocks CPU access to most registers and all GPU protected memory. The NVLink firewall blocks peer GPU access to protected memory.³¹ Communication between CVM and GPU uses AES-GCM encryption with session keys to protect against the host system.³²

The H100's DMA engine supports AES GCM 256 encryption for data transfers between CPU and GPU.³³ A GPU in confidential computing mode blocks direct access to internal memory and disables performance counters that could enable side-channel attacks.³⁴ The architecture evolved from earlier security features: AES authentication on firmware since Volta, encrypted firmware and revocation since Turing and Ampere, and now full measured and attested boot with hardware root of trust in Hopper.³⁵

Microsoft Azure offers confidential VMs with NVIDIA H100 GPUs in preview, enabling training, fine-tuning, and serving of models like Stable Diffusion and large language models with confidential computing protections.³⁶ The Blackwell architecture advances confidential AI further with nearly identical performance whether running encrypted or unencrypted models, even for LLMs.³⁷

Kubernetes GPU security considerations

Namespace isolation in Kubernetes does not provide sufficient security for multi-tenant GPU scheduling.³⁸ Organizations running AI workloads on bare metal Kubernetes with GPUs must implement additional controls. The NVIDIA GPU Operator enables time-slicing and MIG configuration, but security depends on proper configuration and hardening.

The September 2024 NVIDIA Container Toolkit security bulletin prompted urgent upgrades. Organizations should run Container Toolkit v1.16.2 or higher, or GPU Operator v24.6.2 or higher.³⁹ The vulnerabilities demonstrated that container escape attacks could compromise GPU isolation even when properly configured at higher levels.

Third-party solutions address gaps in native Kubernetes GPU management. Volcano provides a cloud-native batch scheduler with fine-grained control over priorities and fairness for high-performance workloads.⁴⁰ Run:ai, now part of NVIDIA, manages and optimizes GPU resources for AI workloads with features designed for multi-tenant environments.⁴¹ vCluster Labs announced its Infrastructure Tenancy Platform for AI at KubeCon North America 2025, delivering Kubernetes-native foundations for NVIDIA GPU infrastructure.⁴²

Organizations using vCluster report 40% improvement in GPU utilization and 60% reduction in infrastructure costs through dynamic multi-tenant orchestration.⁴³ The efficiency gains demonstrate that proper multi-tenant architectures can improve both security and economics compared to dedicated GPU allocations.

Side-channel attacks and emerging threats

GPU memory attacks exploit shared architecture in multi-tenant environments to breach data confidentiality and degrade performance.⁴⁴ Attackers using contention-based side channels can infer sensitive information from co-located workloads.⁴⁵ GPU Memory Attacks target shared memory to facilitate information leakage and covert channels between tenants.⁴⁶

A Rowhammer hardware attack, previously known to affect CPU memory, compromises GPUs with GDDR memory and causes severe AI model accuracy loss.⁴⁷ The attack exploits GPU parallelism to induce bit flips, posing particular risks in cloud environments where attackers may co-locate with target workloads.⁴⁸

The primary risk in virtualized GPU environments remains cross-virtual machine attacks.⁴⁹ Multiple tenants running workloads on the same physical GPU create opportunities for isolation mechanism flaws to enable snooping. This fundamentally breaks the cloud security model and poses serious risks to data confidentiality.⁵⁰

Mitigation strategies include strong workload isolation that avoids running sensitive and non-sensitive workloads on the same GPU, cache partitioning to reduce shared cache exposure, and randomized scheduling to complicate timing-based attacks.⁵¹ Single Root I/O Virtualization or similar security-enhanced virtualization technologies provide additional protection.⁵² Confidential GPUs represent the next frontier, extending TEE-like protections to GPU memory and execution flows.⁵³

Enterprise security best practices

Organizations deploying shared GPU infrastructure should implement security controls appropriate to their risk tolerance and regulatory requirements.

For sensitive workloads, single-tenant options where GPUs are not shared reduce risk of side-channel attacks and align with compliance requirements.⁵⁴ Some certifications require dedicated hardware for certain data types.⁵⁵ The cost premium for dedicated GPUs may be justified by security requirements.

Driver and firmware security requires consistent updates with the most recent security patches.⁵⁶ NVIDIA recommends quarterly firmware updates and driver validations during scheduled maintenance windows.⁵⁷ The January 2025 vulnerability disclosure demonstrates the importance of timely patching.

Memory hygiene between sessions prevents data leakage. Zeroing GPU memory between sessions eliminates a major class of attacks with minimal performance impact.⁵⁸ The practice should be mandatory for any multi-tenant deployment.

Monitoring capabilities should identify atypical GPU usage patterns that could indicate cryptojacking, denial-of-service attacks, or resource misuse.⁵⁹ AI and ML techniques help detect advanced attacks that simple threshold monitoring would miss.

Access controls in Kubernetes require hardening beyond defaults. RBAC configuration, network segmentation for GPU nodes, and continuous monitoring for misconfigurations reduce attack surface.⁶⁰ Kubernetes alone does not provide sufficient protection without deliberate security architecture.

For organizations with strict compliance requirements, major cloud providers offer confidential computing capabilities. Banks, healthcare organizations, and government agencies use Azure confidential computing and similar offerings for sensitive AI workloads with controlled data residency and enforced security policies.⁶¹

The Cisco Secure AI Factory with NVIDIA, unveiled at GTC in March 2025, provides comprehensive architecture for AI infrastructure with security and observability at the forefront.⁶² The approach demonstrates how vendors now integrate security into AI infrastructure design rather than treating it as an afterthought.

Strategic considerations

Multi-tenant GPU security represents a specialized domain that organizations cannot approach with generic data center security practices. The hardware architectures, software stacks, and attack surfaces differ significantly from traditional compute infrastructure.

Organizations should evaluate their workload sensitivity and compliance requirements to determine appropriate isolation levels. MIG provides hardware isolation suitable for most enterprise multi-tenant requirements. Time-slicing serves lower-sensitivity workloads where cost efficiency outweighs isolation needs. Confidential computing on H100 and Blackwell addresses the most demanding security requirements.

The 37% annual surge in GPU server deployments in 2025 means security practices developed now will govern infrastructure for years.⁶³ Organizations that establish proper multi-tenant security architecture can consolidate workloads safely and economically. Those that ignore GPU-specific security risks face breaches that could compromise AI systems representing significant organizational investment.

GPU infrastructure security warrants the same deliberate attention organizations apply to network security, identity management, and data protection. The stakes justify the investment.

References

Market.us. "AI Infrastructure Security Market Size | CAGR of 24.9%." 2025. https://market.us/report/ai-infrastructure-security-market/
Market.us. "AI Infrastructure Security Market Size."
Market.us. "AI Infrastructure Security Market Size."
VC Solutions. "GPU Security Challenges in the Age of AI Technology." 2025. https://www.vcsolutions.com/blog/gpu-security-challenges-in-the-age-of-ai-technology/
Edera. "7 Critical NVIDIA GPU Vulnerabilities Expose AI Systems." January 2025. https://edera.dev/stories/7-critical-nvidia-gpu-vulnerabilities-expose-ai-systems-protect-your-infrastructure-now
Edera. "7 Critical NVIDIA GPU Vulnerabilities Expose AI Systems."
Market.us. "AI Infrastructure Security Market Size."
Market.us. "AI Infrastructure Security Market Size."
Liquid Web. "GPU Vulnerability: 8 Security Risks and How to Address Them." 2025. https://www.liquidweb.com/gpu/vulnerability/
Liquid Web. "GPU Vulnerability: 8 Security Risks and How to Address Them."
ResearchGate. "Memory Under Siege: A Comprehensive Survey of Side-Channel Attacks on Memory." 2025. https://www.researchgate.net/publication/391250570_Memory_Under_Siege_A_Comprehensive_Survey_of_Side-Channel_Attacks_on_Memory
DevZero. "Part 4: GPU Security and Isolation." 2025. https://www.devzero.io/blog/gpu-security-and-isolation
NVIDIA. "Multi-Instance GPU (MIG)." 2025. https://www.nvidia.com/en-us/technologies/multi-instance-gpu/
NVIDIA. "Multi-Instance GPU (MIG)."
NVIDIA. "MIG User Guide." 2025. https://docs.nvidia.com/datacenter/tesla/mig-user-guide/index.html
NVIDIA. "MIG User Guide."
DevZero. "Part 4: GPU Security and Isolation."
NVIDIA. "MIG User Guide."
NVIDIA. "NVIDIA Virtual GPU Software: Accelerate AI, VDI & Graphics Workloads." 2025. https://www.nvidia.com/en-us/data-center/virtual-solutions/
NVIDIA. "NVIDIA Virtual GPU Software."
NVIDIA. "Time-Slicing GPUs in Kubernetes." 2025. https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/gpu-sharing.html
NVIDIA. "Time-Slicing GPUs in Kubernetes."
vCluster. "What Is GPU Sharing in Kubernetes? Strategies for AI Efficiency." 2025. https://www.vcluster.com/blog/gpu-sharing-kubernetes
NVIDIA. "Time-Slicing GPUs in Kubernetes."
vCluster. "What Is GPU Sharing in Kubernetes?"
vCluster. "What Is GPU Sharing in Kubernetes?"
NVIDIA Developer Blog. "Confidential Computing on NVIDIA H100 GPUs for Secure and Trustworthy AI." 2023. https://developer.nvidia.com/blog/confidential-computing-on-h100-gpus-for-secure-and-trustworthy-ai/
Communications of the ACM. "Creating the First Confidential GPUs." 2024. https://cacm.acm.org/practice/creating-the-first-confidential-gpus/
NVIDIA. "AI Security with Confidential Computing." 2025. https://www.nvidia.com/en-us/data-center/solutions/confidential-computing/
NVIDIA. "Confidential Compute on NVIDIA Hopper H100 Whitepaper." 2023. https://images.nvidia.com/aem-dam/en-zz/Solutions/data-center/HCC-Whitepaper-v1.0.pdf
NVIDIA. "Confidential Compute on NVIDIA Hopper H100 Whitepaper."
NVIDIA. "Confidential Compute on NVIDIA Hopper H100 Whitepaper."
NVIDIA Developer Blog. "Confidential Computing on NVIDIA H100 GPUs."
Edgeless Systems. "Nvidia Hopper H100." 2025. https://www.edgeless.systems/wiki/hardware/nvidia-hopper-h100
NVIDIA Developer Blog. "Announcing Confidential Computing General Access on NVIDIA H100 Tensor Core GPUs." 2024. https://developer.nvidia.com/blog/announcing-confidential-computing-general-access-on-nvidia-h100-tensor-core-gpus/
Microsoft Tech Community. "Announcing Azure confidential VMs with NVIDIA H100 Tensor Core GPUs in Preview." 2024. https://techcommunity.microsoft.com/t5/azure-confidential-computing/announcing-azure-confidential-vms-with-nvidia-h100-tensor-core/ba-p/3975389
NVIDIA. "AI Security with Confidential Computing."
vCluster. "Bare Metal Kubernetes with GPU: Multi-Tenancy Challenges and vCluster Solutions." 2025. https://www.vcluster.com/blog/bare-metal-kubernetes-with-gpu-challenges-and-multi-tenancy-solutions
NVIDIA. "Time-Slicing GPUs in Kubernetes."
vCluster. "Bare Metal Kubernetes with GPU."
vCluster. "Bare Metal Kubernetes with GPU."
vCluster. "Multi tenancy in 2025 and beyond." 2025. https://www.vcluster.com/blog/multi-tenancy-in-2025-and-beyond
Efficiently Connected. "vCluster Introduces Infrastructure Tenancy Platform for AI." 2025. https://www.efficientlyconnected.com/vcluster-introduces-infrastructure-tenancy-platform-for-ai-to-maximize-nvidia-gpu-efficiency/
ResearchGate. "Memory Under Siege."
ResearchGate. "Memory Under Siege."
ResearchGate. "Memory Under Siege."
TechXplore. "Researchers discover a GPU vulnerability that could threaten AI models." September 2025. https://techxplore.com/news/2025-09-gpu-vulnerability-threaten-ai.html
TechXplore. "Researchers discover a GPU vulnerability."
DevZero. "Part 4: GPU Security and Isolation."
DevZero. "Part 4: GPU Security and Isolation."
Liquid Web. "GPU Vulnerability: 8 Security Risks and How to Address Them."
NVIDIA Docs. "Workload Isolation — NVIDIA Software Reference Architecture for Multi-Tenant Clouds." 2025. https://docs.nvidia.com/ai-enterprise/planning-resource/reference-architecture-for-multi-tenant-clouds/latest/workload-isolation.html
Duality Technologies. "Confidential Computing & TEEs: What Enterprises Must Know in 2025." 2025. https://dualitytech.com/blog/confidential-computing-tees-what-enterprises-must-know-in-2025/
RunPod. "Keeping Data Secure: Best Practices for Handling Sensitive Data with Cloud GPUs." 2025. https://www.runpod.io/articles/guides/keep-data-secure-cloud-gpus
RunPod. "Keeping Data Secure."
Trend Micro. "Navigating the Threat Landscape for Cloud-Based GPUs." 2025. https://www.trendmicro.com/vinfo/us/security/news/threat-landscape/navigating-the-threat-landscape-for-cloud-based-gpus
Introl. "GPU Deployments: The Definitive Guide for Enterprise AI Infrastructure." 2025. https://introl.com/blog/gpu-deployments-the-definitive-guide-for-enterprise-ai-infrastructure
SecureWorld. "GPU Hosting, LLMs, and the Unseen Backdoor." 2025. https://www.secureworld.io/industry-news/gpu-hosting-llms-unseen-backdoor
Trend Micro. "Navigating the Threat Landscape for Cloud-Based GPUs."
Zadara. "GPU Cloud Computing: Expert Guide to Choosing the Right Provider (2025)." 2025. https://www.zadara.com/blog/2025/01/01/gpu-cloud-computing-expert-guide-to-choosing-the-right-provider-2025/
Google Cloud Community. "Protecting Your Data: Why Confidential Computing is Necessary for Your Business." 2025. https://security.googlecloudcommunity.com/community-blog-42/protecting-your-data-why-confidential-computing-is-necessary-for-your-business-4007
Cisco Newsroom. "Cisco Delivers AI Innovations across Neocloud, Enterprise and Telecom with NVIDIA." October 2025. https://newsroom.cisco.com/c/r/newsroom/en/us/a/y2025/m10/cisco-delivers-ai-networking-innovations-across-neocloud-enterprise-and-telecom-with-nvidia.html
SimCentric. "GPU Memory Encryption: Safeguard US Server Compute Security." 2025. https://www.simcentric.com/america-dedicated-server/gpu-memory-encryption-safeguard-us-server-compute-security/

Key takeaways

For security architects: - 90% deploy AI but only 5% feel security-ready; 97% of breached organizations lacked AI access controls - January 2025: 7 NVIDIA security vulnerabilities disclosed affecting millions of systems - GPUs lack robust memory isolation—if memory clears improperly, attackers retrieve leftover data

For infrastructure teams: - MIG (Ampere+): hardware isolation up to 7 instances; separate paths through crossbar, L2, memory controllers - vGPU provides full IOMMU protection with live migration support; time-slicing lacks memory/fault isolation - H100 confidential computing: AES GCM 256 encryption, hardware root of trust, PCIe/NVLink firewalls

For Kubernetes operators: - Namespace isolation insufficient for multi-tenant GPU scheduling—additional controls required - Run Container Toolkit v1.16.2+ or GPU Operator v24.6.2+ (September 2024 security bulletin) - vCluster reports 40% GPU utilization improvement, 60% infrastructure cost reduction

For compliance teams: - Azure confidential VMs with H100 in preview for HIPAA, banking, government workloads - Blackwell achieves nearly identical performance encrypted vs unencrypted—even for LLMs - Cisco Secure AI Factory with NVIDIA (GTC March 2025): security and observability integrated from design

The multi-tenant GPU security landscape

Hardware isolation with Multi-Instance GPU

vGPU and time-slicing alternatives

Confidential computing on GPUs

Kubernetes GPU security considerations

Side-channel attacks and emerging threats

Enterprise security best practices

Strategic considerations

References

Key takeaways

You Might Also Like

Kubernetes for GPU Orchestration: Managing Multi-Thousand GP...

AI Accelerators Beyond GPUs: TPU, Trainium, Gaudi, Groq, Cer...

Autonomous Vehicle AI Infrastructure: Edge-to-Cloud GPU Requ...

Request a Quote_

Request Received_