Loading…
Attending this event?
October 28-29, 2024 | Tokyo, Japan
View More Details & Registration
Note: The schedule is subject to change.

The Sched app allows you to build your schedule but is not a substitute for your event registration. You must be registered for Open Source Summit + AI_dev Japan 2024 to participate in the sessions. If you have not registered but would like to join us, please go to the event registration page to purchase a registration.

This schedule is automatically displayed in Japan Standard Time (UTC +9). To see the schedule in your preferred timezone, please select from the drop-down located at the bottom of the menu to the right.
Monday October 28, 2024 14:00 - 14:40 JST
GPU Cloud has become a ubiquitous component of contemporary AI infrastructure, especially for distributed machine learning scenarios. While conversations around AI infrastructure optimization typically revolve around the application layer, such as machine learning tasks and distributed job schedulers, delving into the underhood of the GPU cloud is essential. Numerous factors, including POD Scheduler, Device Plugin, GPU/NUMA topology, ROCE/NCCL Stack, and more, can significantly impact performance.

This session will thoroughly explore the tuning of various machine models(CNN/RNN/Transformer) from MLPerf using an H100 Cluster as a reference. We will analyze the correlation between model performance and device operator configuration in nodes by presenting first-hand experimental results to unveil the hidden potential within a K8S GPU Cloud.
Speakers
avatar for Liang Yan

Liang Yan

Sr. Software Engineer, CoreWeave
Liang Yan is a senior software engineer at Coreweave, specializing in AI Infra, heterogeneous architecture acceleration and distributed machine learning systems from the cloud base. He collaborates closely with upstream communities and leading vendors like NVIDIA, AMD and ARM, delivering... Read More →
Monday October 28, 2024 14:00 - 14:40 JST
Hall B (4)
  AI_dev

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Share Modal

Share this link via

Or copy link