Akamai Inference Cloud

Inference at the edge, not in a distant data center

Akamai Inference Cloud combines NVIDIA RTX PRO™ Servers — featuring NVIDIA RTX PRO 6000 Blackwell Server Edition GPUs, NVIDIA BlueField-3® DPUs, and NVIDIA AI Enterprise software — with Akamai's distributed cloud computing infrastructure and global edge network, which has more than 4,400 locations worldwide.

It is the first global-scale implementation of NVIDIA's AI Grid reference architecture, designed to spread inference work across data centers, regional cloud sites, and edge locations — reducing latency and improving cost efficiency for workloads that require real-time, consistent responses.

Why a grid instead of a region

Centralized GPU clusters are essential for training models, but they are too slow, too remote, and too rigid for the inference phase — the actual execution of AI in real-time environments. Akamai's own research found that 64% of organizations now require end-to-end response times under 250 milliseconds, while 50% of deployments fail to meet latency demands at peak load. Read the State of AI Inference findings.

The grid answers this with a workload-aware orchestrator that brokers AI requests across compute tiers based on demand and location — routing each inference to the optimal point in the network rather than backhauling everything to a single region.

What runs on the grid

Agentic and conversational AI — real-time multistep reasoning close to users in every region
Live media processing — transcoding, AI upscaling, object detection, and dynamic ad insertion in a single workflow
In-game AI — non-player character interactions at the speed gameplay demands
Fraud detection and safety systems — decisions in milliseconds, processed where data originates
Physical AI and computer vision — camera and sensor streams processed at the edge, respecting data sovereignty

The proof

Benchmarks show RTX PRO 6000 Blackwell on Akamai Cloud delivers up to 1.63x higher inference throughput than the H100, achieving 24,240 TPS per server at 100 concurrent requests — see the full benchmark methodology and results. In production beta, Harmonic processed 300 images in under a minute with GPU memory use below 10% — read the case study.

Access through MobileRider

As an Akamai Preferred Partner, MobileRider holds the relationship that gets you onto Akamai Inference Cloud, with a decade of experience running mission critical media workloads on Akamai's network. See transparent pricing or find the right GPU for your workload.

Inference at the edge, not in a distant data center

Why a grid instead of a region

What runs on the grid

The proof

Access through MobileRider

Related reading

Put your inference on the grid