AI Fleet Management 101AI Fleet Management 101
CoreWeave

AI Fleet Management 101

AI Fleet Management 101

Webinar: AI Fleet Management 101

Location
Location
30 min. webinar
Schedule

Dec 5, 2024

·

9:00 am

PT

December

 

5

 — 

Location
Chen Goldberg
SVP of Engineering
CoreWeave
Peter Salanki
CTO
CoreWeave

Improve the observability and reliability of your AI cluster.

Ready to elevate your Kubernetes cluster management skills? Join CoreWeave’s CTO, Peter Salanki, and SVP of Engineering, Chen Goldberg, for a discussion covering strategies to improve full-stack observability and reliability with AI fleet management.

Gain practical knowledge of building more reliable and efficient AI operations in Kubernetes.

Key takeaways

  • Uncover critical components of a large-scale Kubernetes training cluster optimized for AI workloads.
  • Learn how advanced fleet management techniques can enhance cluster resilience and accelerate time-to-market for AI models.
  • Discover how automation can help detect, diagnose, and respond to job failures, minimizing downtime.
  • Gain insights via comprehensive monitoring across all layers of your AI infrastructure stack.

Keep job interruptions to a minimum, and know why they happen when they do. Register for the webinar today.

Speakers

Chen Goldberg
Chen Goldberg
CoreWeave
SVP of Engineering
Peter Salanki
Peter Salanki
CoreWeave
CTO

Sign up for the webinar

Sign up for the webinar