Improving Analysis Workflows with Snakemake

by PICSciE/Research Computing

Training/Workshop Research & Data Analysis

Tue, Apr 5, 2022

1 PM – 3:30 PM EDT (GMT-4)

Add to Calendar

Online Event

9
Registered

Registration

Details

Tired of writing sbatch scripts and complex bash logic for your work?  Does your directory look like 'step_1.slurm, step_2.slurm, step_3.slurm, step_3_final.slurm'?  Have you struggled to replicate previous results because some intermediate steps are lost to your shell history?  Then you are ready to improve your analysis pipelines with a workflow management system! Snakemake is a concise but descriptive framework for specifying workflows that interfaces with HPC systems.  Written in python, complex relationships can be described through python scripting and any command you can run on a terminal can be executed.  In this workshop, you will take a series of sbatch scripts and develop them into a snakemake workflow to create a reproducible, distributable, and efficient analysis pipeline.  Several cookie-cutter examples will be provided to help jumpstart your work.

Learning objectives: Attendees will learn how to convert their workflows to snakemake and get them running with a slurm scheduler.

Knowledge prerequisites: Basic Linux, HPC, and some familiarity with conda.

Hardware/software prerequisites: (1) Bring a laptop which can connect to the eduroam wireless network. You will also need to be able to Duo authenticate to use campus resources. (2) Have an SSH client installed on your laptop. (3) Register for an account on Adroit. This is the cluster we will use for demonstration purposes. Make sure you can SSH to Adroit before the workshop by following this guide.  (4) Create a conda environment and install snakemake

Workshop format: Demonstration and hands-on
 

Speakers

Troy Comi's profile photo

Troy Comi

Research Software Engineer

Princeton University

Troy is an RSE working with Joshua Akey’s lab, investigating human genetic ancestry and mechanisms of evolution. Within the Lewis-Sigler Institute of Integrative Genomics, he applies rigorous software development practices to develop new analysis pipelines and improve legacy codebases. Past research areas include 3D bioprinting, single cell mass spectrometry, and mass spectrometry imaging. Troy has a B.S in Computer Science, Chemistry, Mathematics, Biochemistry and Cellular Biology and a Ph.D. in Analytical Chemistry.

Hosted By

PICSciE/Research Computing | View More Events
Co-hosted with: GradFUTURES

Contact the organizers