Event details
Webinar: AI Fleet Management 101
Improve the observability and reliability of your AI cluster.
Ready to elevate your Kubernetes cluster management skills? Watch CoreWeave’s CTO, Peter Salanki, and SVP of Engineering, Chen Goldberg, discuss strategies to improve full-stack observability and reliability with AI fleet management.
Gain practical knowledge of building more reliable and efficient AI operations in Kubernetes.
Key takeaways
- Uncover critical components of a large-scale Kubernetes training cluster optimized for AI workloads.
- Learn how advanced fleet management techniques can enhance cluster resilience and accelerate time-to-market for AI models.
- Discover how automation can help detect, diagnose, and respond to job failures, minimizing downtime.
- Gain insights via comprehensive monitoring across all layers of your AI infrastructure stack.
Keep job interruptions to a minimum, and know why they happen when they do. Register for the webinar today.
Speakers


Peter Salanki has served as CoreWeave’s Chief Technology Officer since March 2024. From June 2019 to March 2024, he served in roles of increasing responsibility at CoreWeave, including most recently as Vice President of Engineering from April 2022 to March 2024 and prior to that as Director of Engineering from June 2019 to April 2022. From January 2018 to June 2019, Mr. Salanki served as Director, Americas, Office of the CTO at Sandvine, an application and network intelligence company.


