Multi-GPU Training with PyTorch: Data and Model Parallelism
by
Thu, Nov 9, 2023
4:30 PM – 6 PM EST (GMT-5)
Private Location (sign in to display)
Registration
Details
This workshop aims to prepare researchers to use the new H100 GPU nodes as part of Princeton Language and Intelligence.
Workshop format: Presentation, demo and hands-on
Target audience: This workshop is geared toward researchers looking to train neural networks in PyTorch using multiple GPUs.
Knowledge prerequisites: Participants should have some familiarity with training neural networks with PyTorch.
Hardware/software prerequisites: For this workshop, users must have an account on the Adroit cluster, and they should confirm that they can SSH into Adroit several hours beforehand. Request an account on Adroit: https://bit.ly/3wicSaH (VPN required if off-campus). Details on all of the above can be found in this guide (https://bit.ly/3QER9Sv).
Speakers
Mengzhou Xia
Graduate Student, Computer Science
I'm a fifth-year Computer Science Ph.D. candidate at Princeton NLP, advised by Prof. Danqi Chen. Prior to this, I was a master's student at Carnegie Mellon University, advised by Prof. Graham Neubig. I obtained my Bachelor's degree from Fudan University's School of Data Science in China. My research is partially supported by the 2024 Apple Scholars in AIML PhD fellowship and the 2022 Bloomberg Data Science Ph.D. Fellowship. I have interned at Meta AI, Microsoft Research, and Bloomberg AI throughout my PhD years.
Alexander Wettig
Graduate Student, Computer Science
Princeton University
Alexander is a graduate student in the department of computer science.
Jonathan Halverson
Research Software and Computing Training Lead
Princeton University
Jonathan Halverson is the Research Software and Computing Training Lead with Research Computing.