Published on

May 12, 2026

min read

Red Hat AI Inference on CKS for Hybrid Inference

Urvashi Chowdhary

Running inference workloads in a hybrid environment—on-premises and in the cloud—remains a challenge for many enterprise teams. Security requirements, data residency policies, and regulatory constraints mean running inference across on-premises and cloud environments simultaneously—not because it’s convenient, but because it’s required. With inference workloads in both places, the operational burden compounds: maintaining separate stacks, tooling, and expertise for each environment, all while keeping pace with rapidly evolving models and optimization techniques.

Today, we're announcing a new approach that meets this pressing need—a deployment blueprint for running Red Hat AI Inference on CoreWeave Kubernetes Service (CKS). Developed by CoreWeave and Red Hat, this tested, documented reference architecture gives enterprise teams a supported path to production inference in hybrid environments. With it, teams can run the same open-source inference stack on-premises and on CoreWeave—without sacrificing Kubernetes-native control, open runtimes, or infrastructure transparency.

The new blueprint is a supported deployment path for customers who choose to self-manage inference on CKS using Red Hat's open-source stack. It complements CoreWeave's existing inference portfolio by giving customers the option to run the same solution they already use on other clouds and on-premises.

This collaboration builds on our role as a founding contributor to the llm-d open-source project along with Red Hat, IBM Research, Google Cloud, and NVIDIA. It deepens a shared commitment to making high-performance inference infrastructure accessible, open, and Kubernetes-native.

Why hybrid inference matters now

Enterprise inference is no longer a deployment event. It’s a continuously operated production service where latency, availability, and cost behavior must remain predictable under live demand.

The operational requirement is clear: run the same inference stack both on-prem and in the cloud. When the stack is consistent across environments, operational knowledge transfers cleanly, deployment patterns are reusable, and troubleshooting doesn't require context-switching between platforms. That consistency is also harder to achieve than it sounds. A typical production deployment involves model serving gateways, distributed serving frameworks, inference servers, model optimizations, and support for multiple accelerators—each layer individually configured, tested, and maintained. Teams building this from scratch face a significant operational burden that only grows as models evolve and new optimization techniques emerge.

What Red Hat AI Inference brings to the table

Red Hat AI Inference is an open-source, end-to-end inference solution built for production inference. It includes model serving gateways with standard OpenAI-compatible interfaces, distributed LLM serving through llm-d with efficient inference scheduling and routing, KV cache management, and prefill/decode disaggregation. It supports inference servers such as vLLM for single-node and multi-node serving, model optimizations including quantization and speculative decoding, and works across accelerators.

The llm-d project at the heart of this innovative stack has since been donated to the CNCF as a Sandbox project, reflecting the industry's commitment to making distributed inference a first-class Kubernetes workload. CoreWeave contributed Tensorizer to vLLM, enabling faster model loading when scaling from zero.

Why CKS is the right foundation for this blueprint

CKS is purpose-built for AI. It exposes deep observability at every layer of the stack, from bare-metal GPU allocation to inference-level diagnostics, and provides automated node lifecycle management — including health checks, remediation, and node draining — for high cluster reliability under production demand. For teams running large models that require multi-node inference, high-throughput interconnects, and low-latency scheduling, CKS provides first-to-market access to NVIDIA’s most powerful GPU generations and InfiniBand networking. This infrastructure allows Red Hat AI Inference and llm-d to deliver maximum performance.

CKS also preserves the Kubernetes-native operational model teams already use on-premises. Teams bring their existing Kubernetes expertise and workflows, and they work the same way on CoreWeave as they do in their own data centers. With CKS, there’s no new abstraction, proprietary orchestration layer, or separate set of operational tooling.

Red Hat brings deep enterprise expertise in hybrid deployments, broad adoption across regulated industries, and recognized leadership in open-source Kubernetes and Linux platforms. Together, CoreWeave and Red Hat are making it easier for enterprise teams to deploy production inference with confidence—whether that workload lives on-premises, on CoreWeave, or across both.

Looking ahead

Production inference is evolving quickly, and open, Kubernetes-native approaches should evolve with it. As the llm-d project and Red Hat AI Inference continue to mature, we expect this reference architecture to grow with them—supporting the latest models, accelerators, and deployment patterns that bridge on-premises and cloud environments. To get started, review the deployment blueprint documentation or read the Red Hat perspective on this partnership. For a deeper dive on the inference stack, explore the llm-d project on GitHub.

Explore Red Hat AI Inference on CoreWeave CKS

Explore Red Hat AI Inference

Read the Red Hat announcement

Published on

May 12, 2026

Red Hat AI Inference on CKS for Hybrid Inference

Urvashi Chowdhary

Copied

CoreWeave and Red Hat partner on a deployment blueprint for Red Hat AI Inference on CKS, enabling enterprise teams to run hybrid inference workloads with Kubernetes-native control.

Copied

Red Hat AI Inference on CKS for Hybrid Inference

Why hybrid inference matters now

What Red Hat AI Inference brings to the table

Why CKS is the right foundation for this blueprint

Looking ahead

Red Hat AI Inference on CKS for Hybrid Inference

Related Blogs

5 Misunderstandings About Enterprise AI Training Infrastructure

Choosing the Right NVIDIA Platform for Running Inference on CoreWeave

CoreWeave Closes the Loop Between Training and Inference

Why Distributed Training Fails at Scale

Run Agentic Workloads Safely at Scale with CoreWeave Sandboxes

CoreWeave Is Now the Fastest at Inference on One of the Best Open Source Models Kimi K2.6

Liquid Cooling for AI Data Centers: Run Cold, Act Bold

New CoreWeave SUNK Capabilities Help Teams Build Modern AI Research Clusters

CEO Michael Intrator's 2025 Letter to Shareholders

CoreWeave Announces New Capabilities to Simplify Cross-Cloud AI

Products

Solutions

AI Infrastructure

Why CoreWeave

Resources

About