Back to News Hub
🟩NVIDIA Blog
June 12, 2026
Research

NVIDIA Blackwell Leads on First Agentic AI Infrastructure Benchmark

Overview

NVIDIA's Blackwell Ultra NVL72 platform has been shown to outperform its predecessor, the Hopper, by running up to 20 times more agentic AI workloads per megawatt. The new AgentPerf benchmark from Artificial Analysis highlights the importance of measuring performance for complex agentic tasks, which differ significantly from traditional conversational AI benchmarks.

Key Takeaways

  • NVIDIA Blackwell Ultra NVL72 achieves up to 20x more agents per megawatt compared to NVIDIA Hopper.
  • AgentPerf is the first benchmark designed specifically for agentic AI workloads, reflecting real-world coding scenarios.
  • The benchmark measures performance based on how many agentic tasks a platform can handle simultaneously.
  • NVIDIA's GB300 NVL72 system connects 72 GPUs, enabling efficient execution of large mixture-of-experts models.
  • Performance advantages stem from optimized communication and compute processes within the system.

Stats & Key Facts

  • #20x more agents per megawatt than NVIDIA Hopper
  • #72 GPUs connected in a single rack-scale system
NVIDIA Blackwell Leads on First Agentic AI Infrastructure Benchmark

Understanding Agentic AI Workloads

Agentic AI represents a new frontier in AI performance measurement.

  • ›Unlike conversational AI, which focuses on single responses, agentic AI involves multiple steps and interactions.
  • ›Agents break down tasks into smaller components, requiring numerous LLM calls to complete a goal.

The complexity of agentic AI workloads is multiplicative, meaning that performance metrics must account for the intricacies of task execution. This contrasts sharply with traditional benchmarks that measure only single LLM responses.

Introducing AgentPerf Benchmark

AgentPerf provides a new standard for evaluating AI infrastructure.

  • ›Developed by Artificial Analysis, AgentPerf focuses on real-world agentic tasks.
  • ›The benchmark evaluates how many tasks can be executed simultaneously while maintaining performance thresholds.

AgentPerf is grounded in realistic coding agent trajectories, simulating tasks drawn from public code repositories. This approach ensures that the benchmark reflects the actual performance requirements of deploying AI agents in production environments.

Performance of NVIDIA Blackwell Ultra NVL72

The Blackwell Ultra NVL72 sets a new benchmark in agentic AI performance.

  • ›It runs up to 20x more agents per megawatt compared to the NVIDIA HGX H200 system.
  • ›The system's design allows for efficient distribution of model execution across multiple GPUs.

The performance gains are attributed to extreme codesign across the full stack, which allows for overlapping communication and compute tasks. This design minimizes latency and maximizes throughput, making it ideal for agentic workloads.

Real-World Implications for Enterprises

Understanding performance metrics is crucial for businesses deploying AI agents.

  • ›Enterprises must know how many concurrent agentic tasks can be run per accelerator and per megawatt.
  • ›These metrics directly influence infrastructure decisions and operational efficiency.

As companies increasingly rely on AI agents, the ability to measure and optimize performance will be a key factor in maximizing productivity and minimizing costs. The insights provided by AgentPerf can guide infrastructure investments and operational strategies.

Future of Agentic AI Infrastructure

The landscape of AI infrastructure is evolving with new benchmarks.

  • ›AgentPerf sets the stage for future developments in agentic AI performance measurement.
  • ›As agentic AI continues to grow, ongoing improvements in infrastructure will be necessary.

The introduction of benchmarks like AgentPerf will likely lead to more innovations in AI infrastructure, as developers and enterprises seek to optimize their systems for increasingly complex workloads.

Frequently Asked Questions

What is agentic AI?

Agentic AI refers to AI systems that can perform complex tasks by breaking them down into smaller steps, requiring multiple interactions and calls to large language models.

How does AgentPerf differ from traditional AI benchmarks?

AgentPerf is specifically designed to measure the performance of agentic AI workloads, which involve a series of chained tasks rather than single requests.

What advantages does the NVIDIA Blackwell Ultra NVL72 offer?

The NVIDIA Blackwell Ultra NVL72 can run significantly more agents per megawatt compared to previous systems, thanks to its architecture that connects multiple GPUs for efficient processing.

Why is measuring agentic AI performance important?

Measuring performance is crucial for enterprises to understand how to optimize their AI infrastructure for efficiency and productivity when deploying agents at scale.

What impact will AgentPerf have on AI infrastructure decisions?

The insights from AgentPerf will help enterprises make informed decisions about their AI infrastructure, ensuring they can support the required number of concurrent agentic tasks effectively.

The evolution of AI infrastructure continues to be shaped by innovative benchmarks like AgentPerf.

Continue Learning

Originally published by NVIDIA Blog
Read the original

Comments

Sign in to join the conversation