Unlocking AI Inference at Scale: CoreWeave Joins Red Hat Open Source Project llm-d as Founding Member

CoreWeave and Red Hat logos over a white and blue background

At CoreWeave, we believe open source software (OSS) is essential for driving innovation in AI and ML and offering flexibility to developers. Our purpose-built AI cloud platform has been developed from the ground up on Kubernetes. We’ve made open source contributions such as CoreWeave Tensorizer to deliver the scale, speed, and performance our customers need when running AI workloads. 

Today, we are thrilled to announce that CoreWeave is a founding member of Red Hat's new llm-d OSS project, alongside IBM Research, Google, and NVIDIA. In a fast evolving AI landscape, where the future growth will be fueled by inference—the engine that transforms AI models into actionable results, it is critical to tear down infrastructure silos. We are proud to deepen our long-standing commitment to OSS AI and contribute our expertise in productizing AI inference workloads at scale while advancing the Kubernetes ecosystem and fostering interoperability for AI. 

This groundbreaking initiative has already garnered the support of leading gen AI model providers, AI accelerator pioneers, premier AI cloud platforms, and the developer community. 

About CoreWeave

CoreWeave delivers the leading AI Cloud Platform—purpose-built to deliver the speed, performance, and expertise needed to unleash AI’s full potential. Our customers train and deploy their innovative foundation models on CoreWeave and get cutting-edge performance, reliability, scale, and infrastructure efficiency for their AI workloads. 

Our leadership was recognized by SemiAnalysis’s ClusterMAX™ Rating System as the only cloud provider to earn the top Platinum tier rating. 

Current challenges with LLM inference

vLLM has quickly become the open source de facto inference server, providing day 0 model support for emerging frontier models and support for a broad list of GPUs and accelerators. 

However, as foundation models grow in size, evolve in their capabilities, and increasingly support agentic applications, developers face new challenges in deploying these models at scale while managing infrastructure, costs, and latencies to fit a wide range of use cases and applications. This drives the need for open standards and broader collaboration in the industry to help developers easily navigate rapidly advancing technologies by making it easy to develop, test, and scale inference workloads, as well as increase the interoperability for these workloads across different platforms. 

How llm-d is groundbreaking for AI inference

llm-d is a visionary project that amplifies the power of vLLM to transcend single-server limitations and unlock production scale AI inference. Using the proven orchestration prowess of Kubernetes, llm-d integrates advanced inference capabilities to deliver greater performance and lower latency for inference workloads. 

llm-d delivers Prefill and Decode Disaggregation which enables these components to scale independently, lmcache-based KV cache offloading to optimize memory use, AI Inference Gateway, and AI-aware network routing for more efficient data transfers using NVIDIA Inference Xfer Library (NIXL). In addition, Kubernetes-powered clusters and controllers enable efficient scheduling of compute and storage resources as workload demands fluctuate, and enable interoperability across cloud platforms. 

CoreWeave’s continued commitment to open source

CoreWeave is proud to be a founding member to the project alongside Google, IBM Research, and NVIDIA. We are committed to our deep collaboration with Red Hat on architecting the future of large-scale LLM serving, and are excited to collaborate with an incredible group of partners and the broader developer community to build a flexible, high-performance inference engine that accelerates innovation and lays the groundwork for open, interoperable AI. We look forward to taking our learnings and best practices from managing large-scale Kubernetes and AI inference deployments to contribute to llm-d and reduce the heavy lifting needed from everyday developers. 

Our approach to open source focuses on promoting open and flexible interfaces. For example, our contributions will enable  Kubernetes native operators for deployment and management, comprehensive testing and benchmarking harnesses for production workloads, and effective deployment and monitoring of the inference stack at scale across multiple GPUs and clusters. 

Additionally, we are focused on reducing latency, optimizing costs, and pushing the boundaries of scale for AI workloads. We contributed CoreWeave Tensorizer to vLLM with expanded support for llm-d, enabling more than 5X faster model loading compared to HuggingFace when scaling from zero through an innovative “zero-copy” approach. 

Lastly, as the leading AI cloud platform and the first to deploy latest hardware, including GB200, CoreWeave will lead the charge in unlocking the full potential of the hardware innovations. These capabilities will unlock developers’ productivity and make it easy for them to build once and run across different cloud platforms that support Kubernetes deployments. 

Get involved: Join the llm-d project

Open source AI initiatives become more powerful with more contributions across the industry. Whether you’re part of a long-standing enterprise or a budding startup ready to accelerate, llm-d offers a flexible, powerful platform to build upon.

Get started with llm-d today. Contact us to get connected with our team of experts and experience our industry leading cloud platform.

Unlocking AI Inference at Scale: CoreWeave Joins Red Hat Open Source Project llm-d as Founding Member

We're proud to announce we've joined Red Hat's new llm-d OSS project as a founding contributor. Learn more about how it's transforming AI inference.

Related Blogs

Tensorizer,
AI Inference,