Big Data Analytics and Machine Learning with Spark

Name: Big Data Analytics and Machine Learning with Spark
Start: 2023-02-15T16:30:00-05:00
End: 2023-02-15T18:00:00-05:00
Location: Private Location (sign in to display)

by

Training/Workshop Programming Languages Research & Data Analysis

Wed, Feb 15, 2023

4:30 PM – 6 PM EST (GMT-5)

Private Location (sign in to display)

11

Registered

Registration

Registration is now closed (this event already took place).

Details

This workshop will show participants how to get started with analyzing and training machine learning models on big data using Apache Spark on the Princeton HPC clusters. Spark is a scalable, general-purpose big data processing engine that supports multiple frontend languages including Python and R.

Knowledge prerequisites: Participants should be familiar with Python or R and have some knowledge of data analytics. The workshop assumes that attendees are new to working with big data.

Hardware/software prerequisites: For this workshop, users must have an account on the Adroit cluster, and they should confirm that they can SSH into Adroit *at least 48 hours beforehand*. Details can be found in this guide. THERE WILL BE LITTLE TO NO TROUBLESHOOTING DURING THE WORKSHOP!

Workshop format: Lecture, demonstration and hands-on

Learning objectives: Attendees will learn how to use Apache Spark to analyze large datasets and train machine learning models.

Speakers

Jonathan Halverson

Research Software and Computing Training Lead

Princeton University

Jonathan Halverson is the Research Software and Computing Training Lead with Research Computing.

Hosted By

Research Computing | View More Events
Co-hosted with: GradFUTURES