NVIDIA Vera Rubin on CoreWeave Cloud

Vera Rubin is NVIDIA’s most advanced rack-scale AI compute system, built for the age of agentic AI and reasoning.

Why AI leaders trust CoreWeave for
NVIDIA Vera Rubin

Proven across generations

Adopting new compute without the right support means your team absorbs the learning curve. From NVIDIA Ada Lovelace to Hopper to Blackwell, CoreWeave has operationalized major NVIDIA platforms, so your team hits the ground running.

Performance, perfected

NVIDIA Vera Rubin sets a new bar for performance. CoreWeave is engineered to make sure your workloads realize every bit of it.

Scale with operational excellence

Production AI workloads can break in ways you can’t always predict. CoreWeave handles that operational complexity at factory scale, so your largest clusters keep running when it matters most.

Left
Right
the essential cloud for ai
of the top foundation model
providers rely on CoreWeave
Upcoming launch event

Scaling the Agentic Era with NVIDIA Vera Rubin NVL72 on CoreWeave Cloud

Join theCUBE for an exclusive online launch event where you'll hear directly from CoreWeave and learn how the next generation of accelerated computing is being built for the agentic era — right now.

Live broadcast ‍June 30, 2026 at 12:30 PM ET
Copied
Save the date

NVIDIA Vera Rubin NVL72: powering large-scale agentic AI

NVIDIA Vera Rubin NVL72 is a rack-scale AI supercomputer built for agentic reasoning. One liquid-cooled rack integrates 72 NVIDIA Rubin GPUs, 36 NVIDIA Vera CPUs, ConnectX-9 SuperNICs, and BlueField-4 DPUs into a single 20.7 TB unified HBM4 memory domain. NVLink 6 Switch bandwidth connects the rack at 260 TB/s of low-latency GPU communication, while NVIDIA Quantum-X800 InfiniBand and Spectrum-X Ethernet with new Spectrum-6 switches scale deployments across thousands of GPUs.

By the numbers

NVIDIA Vera Rubin NVL72 specifications

Specification
NVIDIA GB200 NVL72
NVIDIA Vera Rubin NVL72
HBM type
HBM3e
HBM4
NVLink Generation
NVLink 5
NVLink 6
Memory Bandwidth
8 TB/s
Total GPU Memory Bandwidth
~576 TB/s
NVFP4 inference
(per rack)
~720 PFLOPS
FP8 training
(per rack)
~360 PFLOPS
NVLink Bandwidth
130 TB/s
CPU Core Count
(per rack)
2,592 Arm® Neoverse V2 cores
Use Cases

Built for agentic AI, inference, and training

Explore how NVIDIA Vera Rubin NVL72 changes what's achievable across every major AI workload — and the CoreWeave platform advantages that unlock it.

Agentic AI

Production agents plan, call tools, evaluate outputs, and revise across thousands of steps, ultimately generating orders of magnitude more tokens per task than traditional inference. NVIDIA Vera Rubin NVL72 unified memory domain and rack-scale NVLink are purpose-built for this profile: long-running sequences, high memory bandwidth demand, and persistent context that has to survive across every step without hitting a wall.

The CoreWeave advantage

Copied

Agent-native tracing

Trace every step, tool call, and sub-agent interaction. Weave organizes sessions and turns natively so failure modes don't stay buried in logs.

Measure agent performance

Iterate on agents by experimenting with LLMs, prompts, and scorers. Catch regressions before they reach users with rigorous, apples-to-apples comparisons.

Safeguard users and brand

Scorers evaluate agent inputs and outputs for toxicity, bias, PII, and hallucinations — and automatically route to a safe fallback when risks are detected.

Inference

NVIDIA Vera Rubin platform resets the unit economics of inference. Workloads that were already profitable on NVIDIA GB300 NVL72-class hardware become significantly more so. For teams running long-context reasoning, RAG pipelines, or high-throughput serving, the performance and cost improvement compounds with scale.

The CoreWeave advantage

Copied

No more model loading bottlenecks

LOTA streams model weights directly to GPU memory, eliminating the latency spikes that occur when scaling inference replicas under load.

7 GB/s per GPU object storage

CoreWeave AI Object Storage feeds your inference pipeline at rack-native speeds so your serving layer is never waiting on data.

Topology-aware scheduling

CoreWeave Kubernetes Service keeps inference workloads pinned to the NVLink fabric so GPU-to-GPU communication runs at full architectural speed.

Training

Trillion-parameter mixture-of-experts training is gated by all-to-all communication efficiency. NVLink 6 with SHARP cuts that overhead directly, and 3.5× NVFP4 throughput means runs that previously took months can be completed in weeks. For teams at the frontier of model scale, this generation change doesn’t just improve efficiency. It changes what’s possible.

The CoreWeave advantage

Copied

Unified training system

Run large, long-running jobs with topology-aware scheduling that reduces fragmentation and keeps multi-day runs moving forward.

High-performance AI storage

Feed large-scale training pipelines from a single global dataset with up to 7 GB/s per GPU — far beyond traditional object storage.

End-to-end cluster health

Detect GPU stragglers and silent hardware issues before they compound into lost training time. Correlate GPU, network, and storage signals into one operating standard.

Our research depends on infrastructure that's both powerful and reliable, and CoreWeave has delivered on both as we've scaled across Hopper and Blackwell. Their ability to deliver highly performant clusters with full cluster observability and a support team that engages deeply on hard problems gives us the confidence to partner with them on Vera Rubin.
Craig Falls
Head of Quantitative Research, Jane Street

Frequently asked questions

What is NVIDIA Vera Rubin NVL72?

Vera Rubin NVL72 is NVIDIA’s next-generation rack-scale AI computing platform, combining 72 Rubin GPUs and 36 NVIDIA Vera CPUs into a unified system designed for large-scale AI training, inference, and agentic AI workloads.

Why run NVIDIA Vera Rubin on CoreWeave?

CoreWeave is the first cloud provider to bring up and validate NVIDIA Vera Rubin NVL72 — and it's not the first time. CoreWeave brought several NVIDIA compute generations to market faster, which means your team has a partner who has already done the integration work, not one learning alongside you. The platform is built to extract the full performance of Vera Rubin: NVLink-domain-aware scheduling keeps traffic on the 260 TB/s NVLink fabric, SUNK minimizes fragmentation, and Mission Control deliver automated cluster management and software-defined liquid cooling that can isolate a fault without taking down the entire rack.

What workloads is NVIDIA Vera Rubin best suited for?

NVIDIA Vera Rubin is designed for frontier agentic AI workloads, including large language model training, trillion-parameter inference, reasoning models, retrieval-augmented generation (RAG), agentic AI systems, and large-scale mixture-of-experts (MoE) architectures.

When will NVIDIA Vera Rubin be available on CoreWeave?

For select customers interested in large-scale deployments, you can request a capacity planning meeting with our team here.

Left
Right

Start preparing for NVIDIA Vera Rubin

Interested in NVIDIA Vera Rubin NVL72 at scale? Request a briefing to cover workload fit, capacity planning, and onboarding timeline.

Start planning your NVIDIA Vera Rubin deployment

Contact us to learn more about the NVIDIA Vera Rubin Platform on CoreWeave.

Text Link