DigitalOcean and AMD Prove Non-NVIDIA Inference Works at Scale: Character.ai's Billion-Query Breakthrough

Character.ai doubled inference throughput and halved cost per token by migrating to DigitalOcean's AMD-powered platform. The billion-query-per-day deployment validates AMD as a viable alternative for production AI inference—and signals a potential shift in GPU market dynamics.

Blake Crosley

Jan 26, 2026 8 min read Disclaimer

DigitalOcean and AMD Prove Non-NVIDIA Inference Works at Scale: Character.ai's Billion-Query Breakthrough

One billion queries per day. That's the workload Character.ai migrated to DigitalOcean's AMD-powered inference platform—and doubled throughput while cutting cost per token by 50%. The announcement, made January 13, 2026, validates what many in the industry have questioned: can AMD GPUs deliver production-grade AI inference at hyperscale?

For enterprise AI buyers evaluating GPU strategy, the answer matters more than benchmark numbers suggest. NVIDIA's dominance in AI infrastructure has created supply constraints, pricing power, and single-vendor risk that many organizations find uncomfortable. Character.ai's successful migration demonstrates that alternatives exist—if you're willing to invest in optimization.

The deployment represents one of the most demanding inference workloads in production today, serving approximately 20,000 queries per second—about 20% of Google Search's request volume. The fact that this workload runs successfully on AMD hardware changes the competitive landscape for GPU cloud providers.

The Technical Achievement: What Actually Changed

Character.ai's migration wasn't a simple hardware swap. The 2x throughput improvement required coordinated optimization across hardware configuration, model parallelization, kernel implementation, and infrastructure orchestration.

The Model: Qwen3-235B on FP8

The benchmark configuration used Qwen3-235B Instruct FP8—a 235-billion parameter Mixture-of-Experts model with 128 experts and a 240GB weight footprint. Testing conditions specified 5600 input tokens and 140 output tokens per request, with p90 latency targets maintained throughout.

This workload represents enterprise-grade complexity. MoE models present unique parallelization challenges because tokens route to specific experts rather than flowing uniformly through the network. Efficient deployment requires careful consideration of expert placement across GPUs.

Hardware: AMD Instinct MI300X and MI325X

DigitalOcean deployed AMD Instinct MI300X and MI325X GPUs—AMD's top-tier AI accelerators with 192GB and 256GB HBM3e memory respectively.

The memory advantage matters for large models. MI300X provides 5.3 TB/s bandwidth—60% more than NVIDIA's H100 SXM (3.35 TB/s) and higher than H200 (4.8 TB/s). For memory-bandwidth-bound inference workloads, this translates directly to throughput.

The Optimization Stack

The gains came from multiple layers of optimization working together:

Parallelization Strategy: Character.ai moved from a DP1/TP8/EP8 configuration (single replica with tensor and expert parallelism across 8 GPUs) to DP2/TP4/EP4—two independent TP4 groups on a single 8-GPU server. This configuration achieved 45% better throughput than the initial approach and 91% better than non-optimized alternatives.

Expert Parallelism Design: Rather than sharding 128 experts across all GPUs, each GPU in the EP4 configuration hosts 32 complete experts. Full expert placement reduces cross-GPU data movement during token routing—critical for MoE models where expert selection happens dynamically.

FP8 Execution Paths: The deployment enforced FP8 precision throughout the inference pipeline, including KV cache storage. This halved memory requirements for cached context while keeping hardware in optimized compute paths rather than casting between data types.

AITER Kernel Integration: AMD's AITER library—the Inference-focused runtime and optimization framework for transformer workloads—provided specialized FP8 MoE kernels that required close collaboration with AMD engineers to integrate with vLLM.

Topology-Aware Scheduling: GPU allocation on Kubernetes used a scoring matrix that preferred same-NUMA-node placement and xGMI (Infinity Fabric) connected GPUs. On MI325X, every GPU maintains a dedicated 128 GB/s bidirectional link to every other GPU on the node, but crossing NUMA boundaries or PCIe switches degrades performance.

The Business Outcome: Eight Figures

The technical success translated to commercial commitment. Character.ai signed a multi-year, eight-figure annual agreement with DigitalOcean for GPU infrastructure.

David Brinker, Senior Vice President of Partnerships at Character.ai, stated the results "exceeded expectations": "We pushed DigitalOcean aggressively on performance, latency, and scale. DigitalOcean delivered reliable performance that unlocked higher sustained throughput and improved economics."

For DigitalOcean—currently valued at approximately $5 billion with $864 million annual revenue—the deal represents validation of its AI infrastructure strategy. The company's stock has surged 65% over the past year as investors respond to its AI-focused positioning.

What This Means for AMD's Position

The Character.ai deployment arrives at a pivotal moment for AMD's AI business. The company has struggled to convert hardware advantages into market share, primarily due to NVIDIA's software ecosystem dominance.

The ROCm Progress Story

AMD's ROCm stack has improved significantly. Key milestones:

vLLM First-Class Status: ROCm became a fully integrated first-class platform in the vLLM ecosystem, with dedicated CI/CD pipelines going live December 29, 2025.
Test Coverage: AMD CI pipeline pass rates improved from 37% in mid-November 2025 to 93% by mid-January 2026, with 100% coverage as the target.
Pre-Built Images: As of January 6, 2026, users no longer need to build from source—official ROCm-enabled vLLM-omni Docker images are available from Docker Hub.

These improvements address the historical complaint that AMD GPUs required extensive custom work to achieve acceptable performance.

The Remaining Challenges

Character.ai's success required substantial AMD engineering support. According to SemiAnalysis analysis, getting AMD performance within 75% of H100/H200 required "support from multiple AMD teams to fix software bugs." The initial setup required a "~60 command Dockerfile hand-crafted by an AMD principal engineer, taking ~5 hours to build from source."

This level of vendor involvement doesn't scale to every customer. Enterprises without the volume to warrant dedicated AMD engineering support may face a steeper path to production.

The Price Advantage

AMD's hardware cost advantage remains substantial. Estimates suggest AMD sells MI300X chips for $10,000-$15,000 compared to NVIDIA's H100 at $25,000-$40,000—a 2-4x difference.

However, cloud rental rates tell a different story. Current GPU cloud pricing shows H100 from $1.45/hour, MI300X from $1.99/hour, and H200 from $2.25/hour. The rental premium for AMD hardware reflects the limited ecosystem of AMD-focused cloud providers compared to hundreds offering NVIDIA options.

Character.ai's deployment economics work because they achieved 2x throughput gains through optimization. Without those gains, the rental premium would offset hardware cost advantages.

Character.ai: Why This Workload Matters

Character.ai provides crucial context for interpreting the benchmark. The company operates one of the most demanding AI platforms in production—handling thousands of queries per second and 1 million concurrent connections across thousands of GPUs in multiple Kubernetes clusters.

The company has demonstrated exceptional efficiency in operational costs. As of June 2024, Character.ai achieved serving costs 33x lower than baseline and operates at 13.5x lower cost than competitors using commercial APIs.

At 100 million daily active users using the service for an hour per day, Character.ai would spend approximately $365 million annually on serving costs. A competitor using leading commercial APIs would spend at least $4.75 billion for equivalent capacity.

This cost efficiency explains why Character.ai was willing to invest in AMD optimization. The company's business model depends on inference economics that most organizations can't achieve.

Implications for Enterprise GPU Strategy

The Character.ai deployment offers several lessons for enterprises evaluating GPU infrastructure:

When AMD Makes Sense

Memory-bound workloads: MI300X's 192GB capacity and 5.3 TB/s bandwidth advantage over H100 translates to real throughput gains for large models that fit within memory but bottleneck on bandwidth.

High-volume inference: The 2x throughput improvement matters most at scale. Organizations processing millions of queries daily can justify the optimization investment required to achieve these gains.

Multi-vendor strategy: Enterprises uncomfortable with NVIDIA single-vendor dependence now have production-validated alternatives for at least some workloads.

When NVIDIA Remains Preferred

Latency-sensitive applications: NVIDIA H100 exhibits 57% lower memory latency than MI300X. For applications where latency matters more than throughput, this advantage persists.

Software ecosystem dependency: Organizations heavily invested in CUDA, TensorRT, and NVIDIA's toolchain face switching costs that may exceed hardware savings.

Limited optimization capacity: Without engineering resources to achieve Character.ai-level optimization, AMD's theoretical advantages may not materialize in production.

The Emerging Middle Ground

DigitalOcean's approach—"GPUs matter, but outcomes matter more"—reflects a potential market shift. Rather than competing on raw hardware specifications, cloud providers are differentiating on optimization services that extract maximum performance from available silicon.

For enterprises, this creates a new evaluation criterion: which provider delivers the best price-performance for your specific workloads, regardless of underlying hardware?

The Competitive Response

Character.ai's AMD success pressures NVIDIA's inference market position. The company has relied on ecosystem lock-in and software advantages to maintain pricing power despite hardware competition.

However, several factors limit immediate competitive impact:

Supply constraints persist: Even with AMD alternatives proven viable, NVIDIA H100/H200 supply remains constrained for many customers. AMD validation creates options for customers who couldn't secure NVIDIA allocation regardless of preference.

Next-generation transition: NVIDIA's Blackwell architecture (B200) raises the bar with 8 TB/s bandwidth and 30x energy efficiency improvements. AMD's MI350/MI355X promise FP4/FP6 support and structured pruning but face the same late-to-market challenge MI325X encountered.

Enterprise inertia: Large organizations have years of NVIDIA infrastructure investment, training programs, and operational playbooks. Switching costs extend beyond hardware to organizational capability.

What to Watch

The Character.ai deployment establishes a proof point but not a trend. Key indicators to monitor:

Replication attempts: Can other organizations achieve similar gains without Character.ai's unique scale and AMD engineering support?

ROCm adoption metrics: AMD's 93% vLLM test pass rate needs to reach and sustain 100%. Pre-built Docker image availability reduces friction, but enterprise-grade support remains essential.

Cloud provider expansion: DigitalOcean's success may encourage other cloud providers to invest in AMD optimization. More competition in the AMD cloud space would close the rental premium gap.

Next-generation performance: Both AMD (MI350) and NVIDIA (B200) have significant architectural improvements shipping in 2026. Relative positioning at the next node will determine whether AMD gains remain relevant.

For data center operators and enterprise AI buyers, the message is clear: AMD is a viable option for production inference at scale—for organizations willing to invest in optimization. The billion-query proof point can't be dismissed.

The competitive landscape for AI infrastructure continues evolving rapidly. For analysis of GPU deployment, cloud pricing, and enterprise AI strategy, explore Introl's coverage of data center operations and hardware optimization.