sessionI

Session I: Introduction to Unix / O2 and NGS Data Analysis

Description

Session I starts with an overview of bioinformatics and next-generation sequencing (NGS), followed by an interactive introduction to Unix and O2 (the high performance computational cluster maintained by HMS Research Computing). The session wraps up with an in-depth introduction to sequencing technology, library preparation and QC protocols for NGS data.

Day1: We will start the course by laying out the path for the NGS Analysis course and providing a general overview of bioinformatics and NGS data. The remainder of the day will be dedicated to learning Unix, including navigating filesystems and performing basic operations.

Day2: The second day will start with a introduction to computational clusters and O2, which will provide a thorough description of what the cluster environment is and how we benefit from using it. Following the introduction to O2, we will discuss best practices in Research Data Management (RDM) and spend some time implementing some of these to organize our project before jumping into the workflow. We will then introduce how Illumina sequencing works and what goes into preparing libraries for sequencing transcriptomics experiments. Finally, we will learn how to assess and interpret the quality of sequencing (FASTQ) files, discuss different error profiles you can expect to see with Illumina sequencers, and the main sources of errors.

Lessons

Click here for the schedule with links to the lessons.

Learning Objectives

Demonstrate how to access the Unix shell.
Demonstrate how to interact with the directory structure using the command line interface.
Use commands and structures within Unix to work more efficiently.
Demonstrate how to use permissions and consider environmental variables in Unix.
Recognize the advantages of and appropriate usage of high-performance computing.
Describe the structure of a high-performance computing cluster.
Run and manage jobs on the HMS high-performance compute cluster, Orchestra2 (O2), using best practices.
Demonstrate the concepts of multithreading and parallelization on the O2 cluster.
Describe best practices for RNA-seq project organization and data management.
Define "metadata" and why it is essential.
Explain experimental design and sample preparation considerations for an RNA-seq project.
Describe the process of "sequencing by synthesis" using the Illumina technology.
Describe the main types of errors resulting from issues with short read sequencing with Illumina.
Evaluate quality metrics generated by FastQC and troubleshoot issues with sequencing data.

Name		Name	Last commit message	Last commit date
parent directory ..
homework		homework
lessons/images		lessons/images
results		results
schedule		schedule
scripts		scripts
slides		slides
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sessionI

sessionI

README.md

Session I: Introduction to Unix / O2 and NGS Data Analysis

Description

Lessons

Learning Objectives

Files

sessionI

Directory actions

More options

Directory actions

More options

Latest commit

History

sessionI

Folders and files

parent directory

README.md

Session I: Introduction to Unix / O2 and NGS Data Analysis

Description

Lessons

Learning Objectives