Lightning Talk: Enabling Hot Restart of Stateful Applications Including GPU-Accelerate...- Bernie Wu

Описание к видео Lightning Talk: Enabling Hot Restart of Stateful Applications Including GPU-Accelerate...- Bernie Wu

Don't miss out! Join us at our next Flagship Conference: KubeCon + CloudNativeCon North America in Salt Lake City from November 12 - 15, 2024. Connect with our current graduated, incubating, and sandbox projects as the community gathers to further the education and advancement of cloud native computing. Learn more at https://kubecon.io

Lightning Talk: Enabling Hot Restart of Stateful Applications Including GPU-Accelerate AI/ML Workloads - Bernie Wu, MemVerge

This talk will both describe and demonstrate how stateful applications, including GPU-accelerated AI/ML workflows, can be automatically hot restarted after pod kill/eviction events that are augmented by transparent memory-snapshotting techniques. Applications with complex initializations/startup times, as well as long-running, batch-like workloads, are the best candidates for leveraging this type of operator, and representative benchmarks will be shared. This approach can be used in conjunction with Day 2 node maintenance, autoscaling, OOM issues, and Public Cloud Spot instances to obtain operational benefits, including reduced downtime, higher infrastructure utilization, and reduced computing costs. It is hoped this presentation will stimulate a broader discussion of additional use cases for memory snapshotting.

Комментарии

Информация по комментариям в разработке