Skip to content

Latest commit

 

History

History
 
 

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Session I: Introduction to Unix / O2 and NGS Data Analysis

Description

Session I starts with an overview of bioinformatics and next-generation sequencing (NGS), followed by an interactive introduction to Unix and O2 (the high performance computational cluster maintained by HMS Research Computing). The session wraps up with an in-depth introduction to sequencing technology, library preparation and QC protocols for NGS data.

Day1: We will start the course by laying out the path for the NGS Analysis course and providing a general overview of bioinformatics and NGS data. The remainder of the day will be dedicated to learning Unix, including navigating filesystems and performing basic operations.

Day2: The second day will start with a introduction to computational clusters and O2, which will provide a thorough description of what the cluster environment is and how we benefit from using it. Following the introduction to O2, we will discuss best practices in Research Data Management (RDM) and spend some time implementing some of these to organize our project before jumping into the workflow. We will then introduce how Illumina sequencing works and what goes into preparing libraries for sequencing transcriptomics experiments. Finally, we will learn how to assess and interpret the quality of sequencing (FASTQ) files, discuss different error profiles you can expect to see with Illumina sequencers, and the main sources of errors.

Lessons

Click here for the schedule with links to the lessons.

Learning Objectives

  • Demonstrate how to access the Unix shell.
  • Demonstrate how to interact with the directory structure using the command line interface.
  • Use commands and structures within Unix to work more efficiently.
  • Demonstrate how to use permissions and consider environmental variables in Unix.
  • Recognize the advantages of and appropriate usage of high-performance computing.
  • Describe the structure of a high-performance computing cluster.
  • Run and manage jobs on the HMS high-performance compute cluster, Orchestra2 (O2), using best practices.
  • Demonstrate the concepts of multithreading and parallelization on the O2 cluster.
  • Describe best practices for RNA-seq project organization and data management.
  • Define "metadata" and why it is essential.
  • Explain experimental design and sample preparation considerations for an RNA-seq project.
  • Describe the process of "sequencing by synthesis" using the Illumina technology.
  • Describe the main types of errors resulting from issues with short read sequencing with Illumina.
  • Evaluate quality metrics generated by FastQC and troubleshoot issues with sequencing data.