AI Fleet Management 101AI Fleet Management 101AI Fleet Management 101
CoreWeave

AI Fleet Management 101

Event details

Webinar: AI Fleet Management 101

Location
Chen Goldberg
EVP, Product & Engineering
,
CoreWeave
Location
Peter Salanki
Chief Technology Officer, Co-founder
,
CoreWeave
Location
Schedule

Dec 5, 2024

12:00 am

December

5

 — 

Location
45 min. on-demand

Improve the observability and reliability of your AI cluster.

Ready to elevate your Kubernetes cluster management skills? Watch CoreWeave’s CTO, Peter Salanki, and SVP of Engineering, Chen Goldberg, discuss strategies to improve full-stack observability and reliability with AI fleet management.

Gain practical knowledge of building more reliable and efficient AI operations in Kubernetes.

Key takeaways

  • Uncover critical components of a large-scale Kubernetes training cluster optimized for AI workloads.
  • Learn how advanced fleet management techniques can enhance cluster resilience and accelerate time-to-market for AI models.
  • Discover how automation can help detect, diagnose, and respond to job failures, minimizing downtime.
  • Gain insights via comprehensive monitoring across all layers of your AI infrastructure stack.

Keep job interruptions to a minimum, and know why they happen when they do. Register for the webinar today.

Speakers

Chen Goldberg
Chen Goldberg
CoreWeave
EVP, Product & Engineering
Peter Salanki
Peter Salanki
CoreWeave
Chief Technology Officer, Co-founder

Peter Salanki has served as CoreWeave’s Chief Technology Officer since March 2024. From June 2019 to March 2024, he served in roles of increasing responsibility at CoreWeave, including most recently as Vice  President of Engineering from April 2022 to March 2024 and prior to that as Director of Engineering from June 2019 to April 2022. From January 2018 to June 2019, Mr. Salanki served as Director, Americas, Office of the CTO at Sandvine, an application and network intelligence company.

Watch the webinar on-demand

Observability,
Home v3,
Home v2,
Product - GPU Compute,
Product - Virtual Servers,
Solution - Pixel Streaming,
Solution - Machine Learning,
Product - VFX,
Product - Kubernetes,
Product - Concierge Render,
Home,