Banner for Multi-GPU Training with PyTorch: Data and Model Parallelism

Multi-GPU Training with PyTorch: Data and Model Parallelism

by

Training/Workshop Programming Languages Research & Data Analysis

Thu, Nov 9, 2023

4:30 PM – 6 PM EST (GMT-5)

Private Location (sign in to display)

61
Registered

Registration

Details

The first part of this workshop will show participants how to optimize single-GPU training. We will then illustrate the necessary code changes for data-parallel and model-parallel training using multiple GPUs. Model-parallel training is essential for producing new Large Language Models (LLMs).

This workshop aims to prepare researchers to use the new H100 GPU nodes as part of Princeton Language and Intelligence.

Workshop format: Presentation, demo and hands-on

Target audience: This workshop is geared toward researchers looking to train neural networks in PyTorch using multiple GPUs.

Knowledge prerequisites: Participants should have some familiarity with training neural networks with PyTorch.

Hardware/software prerequisites: For this workshop, users must have an account on the Adroit cluster, and they should confirm that they can SSH into Adroit several hours beforehand. Request an account on Adroit: https://bit.ly/3wicSaH (VPN required if off-campus). Details on all of the above can be found in this guide (https://bit.ly/3QER9Sv).

Speakers

Mengzhou Xia's profile photo

Mengzhou Xia

Graduate Student, Computer Science

I'm a fifth-year Computer Science Ph.D. candidate at Princeton NLP, advised by Prof. Danqi Chen. Prior to this, I was a master's student at Carnegie Mellon University, advised by Prof. Graham Neubig. I obtained my Bachelor's degree from Fudan University's School of Data Science in China. My research is partially supported by the 2024 Apple Scholars in AIML PhD fellowship and the 2022 Bloomberg Data Science Ph.D. Fellowship. I have interned at Meta AI, Microsoft Research, and Bloomberg AI throughout my PhD years.

Alexander Wettig's profile photo

Alexander Wettig

Graduate Student, Computer Science

Princeton University

Alexander is a graduate student in the department of computer science.

Jonathan Halverson's profile photo

Jonathan Halverson

Research Software and Computing Training Lead

Princeton University

Jonathan Halverson is the Research Software and Computing Training Lead with Research Computing.

Hosted By

Research Computing | View More Events