Skip to content

Latest commit

 

History

History
 
 

sessionII

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Session II: Introduction to Unix / O2 and NGS Data Analysis

Description

Session II will start with a discussion of the RNA-seq workflow, highlighting different quantification strategies. We will then tackle additional quality control assessment of STAR-aligned data using the Qualimap tool in order to identify any issues related to contamination or biases in our data. Then, we will use Salmon to perform a more accurate quasi-alignment and quantification. We will finish the session by introducing R, which we need to start with differential expression analysis of our count data.

Day1: We will start the first day learning about various types of tools for alignment and quantification. Then we will align the raw sequence data using STAR followed by quality assessment with Qualimap. Students will determine whether the data has any worrisome contamination or biases before moving on to quasi-alignment and quantification with Salmon. MultiQC will be used to assess the quality of the data at all steps in the analysis and is also useful for detecting bias and contamination issues present in the data. We will end the day with an introduction to R and RStudio.

Day2: During the second day of this session, we will dive into R, a software environment for statistical computing and graphics. Within R, we will explore basic data structures, data types, data inspection and extraction, reading and writing files, data wrangling, visualization methods using ggplot2 package.

Lessons

Click here for the schedule with links to the lessons.

Learning Objectives

  • Describe and list the steps in a workflow to analyze RNA sequencing data.
  • Describe commonly used data formats utilized within the RNA-seq workflow
  • Examine the quality of mapped data
  • Use a series of command line tools to execute an RNA-seq workflow from raw sequence data to gene experssion counts.
  • Describe considerations for working with datasets of large sizes.
  • Use the various components of RStudio.
  • Export data tables and plots for use outside of the R environment.
  • Employ variables and functions in R.
  • Modify default behavior of functions using arguments in R.
  • Demonstrate how to install external packages to extend R’s functionality.
  • Identify different R-specific and external sources of help to (1) troubleshoot errors and (2) get more information about functions and packages.
  • Describe the various data types used in R.
  • Construct data structures to store data.
  • Plot graphs using base-R functions and external packages, such as ggplot2.
  • Demonstrate how to subset, merge, and create datasets in R.