GitHub - jiadong324/ComparStra-Parser at 5fa200d16f0e55fafb969f65e6d336debee70bc6

Amis

Evaluating the impact of assemblers, aligners, sequencer and read length on read-based and assembly-based SV detection.

Analysis workflow

Overview

The major parts involved in the comparison are listed below:

Using six long read datasets (details), we assessed the impact of dataset, aligner and assembler on the detection variability.
On each dataset, 20 read-based callsets and four assembly-based callsets were compared to assess the impact of aligner and assembler.
Based on the analysis of 2, we build high-confident insertions and deletions (insdel) callsets of read and assembly. The high-confident insdel callsets are then compared.
Benchmarking 20 read-based and eight assembly-based detection piplines with well curate SVs of HG002 released by GIAB.

SV detection

Please check the wiki page for more details about SV detection and benchmarking.

NOTE: The raw VCF files of all callers among six datasets are available for reproducing the figures (reproduce_data).

Analysis environment

Required tools and packages

## Tools
Jasmine=1.1.4
Samtools=1.9

## Python packages
python=3.6
pandas=1.1.5
numpy=1.19.5
seaborn=0.11.1
pysam=0.15.3
matplotlib_venn=0.11.7
intervaltree=3.1.0

Create environment for data analysis

## Create a python environment
conda create -n py36 python=3.6
conda activate py36

## Install required packages
pip install seaborn==0.11.1
pip install matplotlib-venn==0.11.9
pip install pysam==0.15.3
pip install intervaltree==3.1.0

## Install Jasmine
conda config --add channels bioconda
conda config --add channels conda-forge
conda install jasminesv

Required files

Hg19 reference genome hs37d5.fa.

Please refer CAMPHOR for the process of original repeat files. The files listed below will be used in the analysis (download hg19_repeats and unzip).

Simple repeat file including STR and VNTR (simplerepeat.bed.gz).
Segmental duplication file (seg_dup.bed.gz).
Repeat masker file including LINE, SINE and etc (rmsk.bed.gz).
Hg19 excluded regions (grch37.exclude_regions_cen.bed).

NOTE: Unzip reproduce_data.zip, and the default work directory is ./reproduce_data/. Then, the following variables should be specified in ./Helpers/Constant.py.

WORKDIR = '/path/to/reproduce_data'
FIGDIR = '/path/to/reproduce_data/Figures'

HG19REF = '/path/to/hs37d5.fa'
EXREGIONS = '/path/to/hg19_repeats/grch37.exclude_regions_cen.bed'
SIMREP = '/path/to/hg19_repeats/simplerepeat.bed.gz'
RMSK = '/path/to/hg19_repeats/rmsk.bed.gz'
SD = '/path/to/hg19_repeats/seg_dup.bed.gz'

SAMTOOLS = '/path/to/samtools'
JASMINE = '/path/to/jasmine'

Reproducing results

NOTE: Please run the scripts by the order listed below.

Figure 2

## Figure 2a
python ./Figure2/Figure2a.py
## Figure 2b and 2c
python ./Figure2/Figure2bc.py
## Figure 2d, 2e, 2f and 2g
python ./Figure2/Figure2defg.py

Figure 3

## Figure 3a, 3b and 3c
python ./Figure3/Figure3abc.py
## Figure 3d, 3e and 3f
python ./Figure3/Figure3def.py

Figure 4

## Figure 4a, 4b, 4c and 4d
python ./Figure4/Figure4.py

Figure 5

## Figure 5a, 5b, 5c, 5d, 5e and 5f
python ./Figure5/Figure5.py

Extended Data Figures

## Extended Data Fig 1
python ./SuppFig/FigS1.py

## Extended Data Fig 2
python ./SuppFig/FigS2.py

## Extended Data Fig 3
python ./SuppFig/FigS3.py

## Extended Data Fig 4
python ./SuppFig/FigS4.py

## Extended Data Fig 5
python ./SuppFig/FigS5.py

## Extended Data Fig 6
python ./SuppFig/FigS6.py

## Extended Data Fig 7
python ./SuppFig/FigS7.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Amis

Analysis workflow

Overview

SV detection

Analysis environment

Required tools and packages

Create environment for data analysis

Required files

Reproducing results

Figure 2

Figure 3

Figure 4

Figure 5

Extended Data Figures

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
.idea		.idea
Figure2		Figure2
Figure3		Figure3
Figure4		Figure4
Figure5		Figure5
Helpers		Helpers
Reader		Reader
SuppFig		SuppFig
README.md		README.md

jiadong324/ComparStra-Parser

Folders and files

Latest commit

History

Repository files navigation

Amis

Analysis workflow

Overview

SV detection

Analysis environment

Required tools and packages

Create environment for data analysis

Required files

Reproducing results

Figure 2

Figure 3

Figure 4

Figure 5

Extended Data Figures

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages