Slurm on Kubernetes (SUNK)

Easily run Slurm-based batch jobs or container-based, real-time workloads on the same GPU Cluster

Run batch or real-time workloads <on one cluster>

At CoreWeave, we’re always looking for ways to improve customer experience. That’s why we developed SUNK—or Slurm on Kubernetes. Our solution lets your teams deploy Slurm on Kubernetes with GitOps and colocate Slurm and containerized workloads all on the same cluster. That includes pre-training, inference, experimentation, and everything else you need to build, train and deploy your GenAI applications.

Share compute with ease

Unlock greater workload fungibility with SUNK. You’ll get more out of your compute for less time (and money).

Efficiency

Share resources between Slurm and Kubernetes to dramatically increase resource efficiency and streamline deployment—perfect for running training and inference workloads on the same cluster

Dynamic scalability

SUNK can dynamically scale Slurm nodes to match workload requirements, easing the burden of managing the needs of complex and large-scale compute tasks

Optimization

Run only one cluster—instead of two separate ones for training and inference. Get the benefits of optimal model training throughput and capacity to support production inference demand, accelerating time to market

Run on industry-leading Cloud infrastructure services

SUNK runs on infrastructure services that provide the ideal combination of ease of use, workload fungibility, performance, and scale.

Compute Services

Get the latest GPU compute you need for your AI workloads through a Kubernetes-native environment

Storage Services

Flexible, purpose-built, high-performance storage solutions tailored for AI

Networking Services

High-performance networking for optimal cluster scale-out and connectivity

Supercomputing Scale & Enterprise-grade security

With massive megaclusters, CoreWeave GPU clusters help support multi-trillion parameter model training.

“When customers experienced challenges in interoperating between Slurm and Kubernetes orchestration frameworks, we gave them that capability through our SUNK service that integrates these frameworks. This allows both training and inference to work on the same infrastructure, which is a massive efficiency unlock for our customers.”

— Mike Intrator, CEO at CoreWeave

Learn more

Technical Partnership and 24/7 support

Our team of solution architects will get SUNK up and running for you in a matter of hours.

‍

A partnership mindset

Experience top-of-the-line assistance with extensive and comprehensive onboarding

Best-in-class teams

Access expert engineers for day-to-day support via Slack, with ultra-fast turnaround times

Enhanced observability

Get better visibility into critical hardware, Kubernetes, and Slurm job metrics via intuitive dashboards

‍

See what SUNK can do

Get the resource flexibility your teams need to build, train, and deploy new models.

Start today