NVIDIA Vera Rubin on CoreWeave Cloud

Vera Rubin is NVIDIA’s most advanced rack-scale AI compute system, built for the age of agentic AI and reasoning.

Why AI leaders trust CoreWeave for
NVIDIA Vera Rubin

‍

Proven across generations

Adopting new compute without the right support means your team absorbs the learning curve. From NVIDIA Ada Lovelace to Hopper to Blackwell, CoreWeave has operationalized major NVIDIA platforms, so your team hits the ground running.

Performance, perfected

NVIDIA Vera Rubin sets a new bar for performance. CoreWeave is engineered to make sure your workloads realize every bit of it.

Scale with operational excellence

Production AI workloads can break in ways you can’t always predict. CoreWeave handles that operational complexity at factory scale, so your largest clusters keep running when it matters most.

the essential cloud for ai

of the top foundation model
providers rely on CoreWeave

ON DEMAND NOW

Scaling the Agentic Era with NVIDIA Vera Rubin NVL72 on CoreWeave Cloud

CoreWeave took theCUBE for an exclusive launch event, going live on how the next generation of accelerated computing is being built for the agentic era.

Watch the replay

NVIDIA Vera Rubin NVL72: powering large-scale agentic AI

NVIDIA Vera Rubin NVL72 is a rack-scale AI supercomputer built for agentic reasoning. One liquid-cooled rack integrates 72 NVIDIA Rubin GPUs, 36 NVIDIA Vera CPUs, ConnectX-9 SuperNICs, and BlueField-4 DPUs into a single 20.7 TB unified HBM4 memory domain. NVLink 6 Switch bandwidth connects the rack at 260 TB/s of low-latency GPU communication, while NVIDIA Quantum-X800 InfiniBand and Spectrum-X Ethernet with new Spectrum-6 switches scale deployments across thousands of GPUs.

By the numbers

NVIDIA Vera Rubin NVL72 specifications

Specification

NVIDIA GB200 NVL72

NVIDIA Vera Rubin NVL72

Inference Throughput (TPS/MW)

at 150 TPS/User on DeepSeek R1

80,000

800,000

10x more

HBM type

HBM3e

HBM4

NVLink Generation

NVLink 5

NVLink 6

Memory Bandwidth

8 TB/s

22 TB/s

2.75x more

Total GPU Memory Bandwidth

~576 TB/s

1,580 TB/s

2.75x more

NVFP4 inference

(per rack)

~720 PFLOPS

3,600 PFLOPS (3.6 EF)

5x more

FP8 training

(per rack)

~360 PFLOPS

~1,260 PFLOPS

3.5x more

NVLink Bandwidth

130 TB/s

260 TB/s

2x more

CPU Core Count

(per rack)

2,592 Arm® Neoverse V2 cores

3,168 custom NVIDIA Olympus cores

(Arm® compatible)

+22% cores

Use Cases

Built for agentic AI, inference, and training

Explore how NVIDIA Vera Rubin NVL72 changes what's achievable across every major AI workload — and the CoreWeave platform advantages that unlock it.

Agentic AI

Production agents plan, call tools, evaluate outputs, and revise across thousands of steps, ultimately generating orders of magnitude more tokens per task than traditional inference. NVIDIA Vera Rubin NVL72 unified memory domain and rack-scale NVLink are purpose-built for this profile: long-running sequences, high memory bandwidth demand, and persistent context that has to survive across every step without hitting a wall.

The CoreWeave advantage

Copied

Agent-native tracing

Trace every step, tool call, and sub-agent interaction. Weave organizes sessions and turns natively so failure modes don't stay buried in logs.

Explore Weave docs

Measure agent performance

Iterate on agents by experimenting with LLMs, prompts, and scorers. Catch regressions before they reach users with rigorous, apples-to-apples comparisons.

Explore Evaluations

Safeguard users and brand

Scorers evaluate agent inputs and outputs for toxicity, bias, PII, and hallucinations — and automatically route to a safe fallback when risks are detected.

Explore Guardrails

Explore agentic AI on CoreWeave

Inference

NVIDIA Vera Rubin platform resets the unit economics of inference. Workloads that were already profitable on NVIDIA GB300 NVL72-class hardware become significantly more so. For teams running long-context reasoning, RAG pipelines, or high-throughput serving, the performance and cost improvement compounds with scale.

The CoreWeave advantage

Copied

No more model loading bottlenecks

LOTA streams model weights directly to GPU memory, eliminating the latency spikes that occur when scaling inference replicas under load.

Learn about LOTA

7 GB/s per GPU object storage

CoreWeave AI Object Storage feeds your inference pipeline at rack-native speeds so your serving layer is never waiting on data.

Explore storage

Topology-aware scheduling

CoreWeave Kubernetes Service keeps inference workloads pinned to the NVLink fabric so GPU-to-GPU communication runs at full architectural speed.

Learn about CKS

Explore inference on CoreWeave

Training

Trillion-parameter mixture-of-experts training is gated by all-to-all communication efficiency. NVLink 6 with SHARP cuts that overhead directly, and 3.5× NVFP4 throughput means runs that previously took months can be completed in weeks. For teams at the frontier of model scale, this generation change doesn’t just improve efficiency. It changes what’s possible.

The CoreWeave advantage

Copied

Unified training system

Run large, long-running jobs with topology-aware scheduling that reduces fragmentation and keeps multi-day runs moving forward.

Explore SUNK

High-performance AI storage

Feed large-scale training pipelines from a single global dataset with up to 7 GB/s per GPU — far beyond traditional object storage.

Explore storage

End-to-end cluster health

Detect GPU stragglers and silent hardware issues before they compound into lost training time. Correlate GPU, network, and storage signals into one operating standard.

Explore Mission Control

Explore training on CoreWeave

Our research depends on infrastructure that's both powerful and reliable, and CoreWeave has delivered on both as we've scaled across Hopper and Blackwell. Their ability to deliver highly performant clusters with full cluster observability and a support team that engages deeply on hard problems gives us the confidence to partner with them on Vera Rubin.

Craig Falls

Head of Quantitative Research, Jane Street

Frequently asked questions

What is NVIDIA Vera Rubin NVL72?

Vera Rubin NVL72 is NVIDIA’s next-generation rack-scale AI computing platform, combining 72 Rubin GPUs and 36 NVIDIA Vera CPUs into a unified system designed for large-scale AI training, inference, and agentic AI workloads.

Why run NVIDIA Vera Rubin on CoreWeave?

CoreWeave is the first cloud provider to bring up and validate NVIDIA Vera Rubin NVL72 — and it's not the first time. CoreWeave brought several NVIDIA compute generations to market faster, which means your team has a partner who has already done the integration work, not one learning alongside you. The platform is built to extract the full performance of Vera Rubin: NVLink-domain-aware scheduling keeps traffic on the 260 TB/s NVLink fabric, SUNK minimizes fragmentation, and Mission Control deliver automated cluster management and software-defined liquid cooling that can isolate a fault without taking down the entire rack.

What workloads is NVIDIA Vera Rubin best suited for?

NVIDIA Vera Rubin is designed for frontier agentic AI workloads, including large language model training, trillion-parameter inference, reasoning models, retrieval-augmented generation (RAG), agentic AI systems, and large-scale mixture-of-experts (MoE) architectures.