Benchmarking NVIDIA RTX PRO 6000 Blackwell on Akamai Cloud

Executive summary

Benchmarks show the NVIDIA RTX PRO™ 6000 Blackwell running on Akamai Cloud delivers up to 1.63× higher inference throughput than the H100, achieving 24,240 TPS per server at 100 concurrent requests.

Benchmarking Akamai Inference Cloud

Akamai has combined its expertise in globally distributed architectures with NVIDIA Blackwell AI infrastructure to radically rethink and extend the accelerated computing needed to unlock AI's true potential.

The Akamai Inference Cloud platform combines NVIDIA RTX PRO™ Servers — featuring NVIDIA RTX PRO 6000 Blackwell Server Edition GPUs, NVIDIA BlueField-3® DPUs, and NVIDIA AI Enterprise software — with Akamai's distributed cloud computing infrastructure and global edge network, which has more than 4,400 locations worldwide.

Efficient, versatile, and optimized GPUs

Distributed inference and next-generation agentic experiences require GPUs that are efficient, versatile, and optimized for concurrent, real-time workloads. The RTX PRO 6000 Blackwell checks all three boxes. Its FP4 precision mode delivers exceptional throughput at a fraction of the power and cost of datacenter-class GPUs, making it practical to deploy across hundreds of sites.

The architecture supports concurrent and multimodal workloads including text, vision, and speech on a single GPU, reducing the need for specialized accelerators and limiting unnecessary data movement across the network.

NVIDIA highlights that these servers deliver up to 6x higher large language model (LLM) inference throughput, 4x faster synthetic data generation, 7x faster genome sequence alignment, 3x higher engineering simulation throughput, 4x greater real-time rendering performance, and 4x more concurrent multi-instance GPU workloads.

What the benchmarks show

The 1.63x throughput uplift over H100 (FP8) shows that the RTX PRO 6000 Blackwell delivers data center–grade performance in a smaller, easier-to-deploy footprint ideal for distributed environments.
The 1.32x improvement moving from FP8 to FP4 demonstrates how NVIDIA's precision efficiency directly translates to faster, more cost-efficient inference at the edge.
Sustained performance at 100+ concurrent requests validates the GPU's ability to handle multi-tenant, latency-sensitive workloads across globally distributed inference.

Benchmark overview

The tests followed NVIDIA's benchmarking methodology to assess inference performance under consistent load conditions, using Llama-3.3-Nemotron-Super-49B-v1.5 — a reasoning model derived from Meta Llama-3.3-70B-Instruct, post-trained for reasoning, human chat preferences, and agentic tasks such as RAG and tool calling.

Two NVIDIA inference microservice (NIM) profiles for the same model were compared — identical except for precision: FP8 (8-bit floating point) versus NVIDIA's FP4 (NVFP4, supported directly in Blackwell GPUs). NVFP4 delivers major performance and efficiency gains with less than 1% accuracy loss.

Each request processed 200 input tokens and generated 200 output tokens, with 100 concurrent runs measuring time to first token (TTFT) and tokens per second (TPS). Tests ran on RTX PRO 6000 Blackwell Server Edition GPUs in Akamai Cloud's LAX data center, benchmarked against NVIDIA H100 NVL 96GB in the NVIDIA LaunchPad environment.

Detailed results

At the optimal concurrency level of 100, moving from FP8 to FP4 on the RTX 6000 produced a 1.32x performance improvement. Compared against the H100 at FP8, the RTX PRO 6000 Blackwell delivered a 1.63x improvement at NVFP4 — and a 1.21x advantage even at FP8.

Overall, the RTX PRO 6000 Blackwell Server achieved 3,030.01 TPS, equating to up to 24,240.08 TPS with IaaS VM offerings.

Test 1: FP8 vs. FP4 (RTX PRO 6000 Blackwell, Akamai LAX)

Precision	Concurrency	TTFT (ms)	TPS	FP4 gain
FP8	1	44.82	27.42	—
FP8	100	102.03	2,256.30	—
FP8	200	138.66	3,606.04	—
FP4	1	47.92	29.68	1.08x
FP4	100	94.45	3,030.01	1.32x
FP4	200	3,663.26	3,854.76	1.07x

Model: nvidia/llama-3.3-nemotron-super-49b-v1.5, 200 in / 200 out tokens

Test 2: RTX PRO 6000 Blackwell vs. H100 NVL (NVIDIA LaunchPad)

GPU / Precision	Concurrency	TTFT (ms)	TPS
H100 NVL FP8	1	39.52	42.46
H100 NVL FP8	100	1,612.03	1,863.08
H100 NVL FP8	200	12,587.30	1,828.03
RTX PRO 6000 FP8	100	243.68	1,040.33
RTX PRO 6000 FP4	100	344.24	1,848.96
RTX PRO 6000 FP4	200	6,660.54	1,997.30

Note the H100's TTFT degradation under load (1,612 ms at C=100) vs. Blackwell on Akamai Cloud (94 ms at C=100 in Test 1)

Conclusion

The results clearly show that FP4 delivers measurable gains, with a 1.32x improvement in throughput over FP8 on the RTX 6000. When compared with the H100 at FP8, the RTX 6000 (FP4) achieved a 1.63x performance improvement, underscoring the potential of the Blackwell architecture for inference workloads.

These findings demonstrate that RTX 6000 GPUs running on Akamai's distributed cloud can deliver high throughput and efficient scaling for real-world AI inference at lower cost and latency. For teams evaluating GPU options, this combination offers a compelling balance of speed, efficiency, and accessibility across a global infrastructure footprint.