Analysis tools for SNARE-seq2 (Scalable Dual-omic Profiling with Single-nucleus Chromatin Accessibility and mRNA Expression Sequencing 2
Install all dependencies before running. This software have only been tested on a Debian based Linux system but should work on most Linux systems.
The scripts uses Perl, Python3, and R.
- samtools (v1.10 or newer): https://github.com/samtools/samtools/releases/download/1.10/samtools-1.10.tar.bz2
- pysam:
pip install Cython \ pip install pysam
- cutadapt:
python3 -m pip install --user --upgrade cutadapt
- deindexer: https://github.com/ws6/deindexer
- dropEst: https://dropest.readthedocs.io/en/latest/setup
- STAR: https://github.com/alexdobin/STAR
- Minimap2: https://github.com/lh3/minimap2
- SnapTools: https://github.com/r3fang/snaptool9
- R packages: httr, broom, igraph, png, rlang, crayon, digest, assertthat, glue, purr, backports, dplyr, tidytext, dropestr, DropletUtils, Seurat, Rcpp, RcppEigen, Rinside, Matrix, optparse, rmarkdown, withr
Download scripts from github:
git clone http://github.com/dinhdiep/snare_seq2
There are two main scripts: SeqToFrag.sh
, SeqToRDS.sh
.
RNA count matrices require the separated dT and N6 barcodes to be merged. The R script "create_seurat_kneeplot_2.r" script is provided for this purpose.
Input | Explanation of values |
---|---|
d | the path to the raw fastq folder |
f | the fastq id matching the pattern <fastq.id>R1fastq.gz, <fastq.id>R2fastq.gz, <fastq.id>R3fastq.gz |
l | the library ids for each sample separated by "," (ie sample1,sample2,sample3). Must correspond to the samples in the samples list folder. |
s | samples list folder that contains the whitelist dT/N6/chromatin barcoded wells for each sample in Round 1 plate |
a | the path to the minimap2 software folder |
g | the path to the genome reference fasta file |
c | the path to the genome reference sequence sizes (ie. *fai file) |
n | the genome name used by snaptools (hg38, hg19, mm10, etc). |
p | path to preseq if sequencing saturation info needed (optional) |
Outputs: BED formated fragment files, snap files from snaptools, alignment BAM files
Output folders: by_samples_fastq, 01_snaretag, 02_alignment (BAM), 03_snap (count matrices)
Input | Explanation of values |
---|---|
d | the path to the raw fastq folder |
f | the fastq id matching the pattern <fastq.id>R1fastq.gz, <fastq.id>R2fastq.gz |
s | samples list folder that contains the whitelist dT/N6/chromatin barcoded wells for each sample in Round 1 plate |
p | the path to the dropest software folder |
a | the path to the STAR software folder |
i | the path to the STAR index files |
g | the genes.gtf - the genes annotation file |
c | the path to cutadapt software folder, (ie ~/.local/bin for python pip installation) |
Outputs: Matrix (.mtx), genes.tsv, barcodes.tsv, R data file with count matrix and dropest diagnostic statistics, alignment BAM files
Output folders: by_samples_fastq, 01_droptag, 02_alignment (BAM), 03_dropEst (count matrices)
Run create_seurat_kneeplot_2.r 03_dropEst
to generate Seurat rds file with dT and N6 barcodes merged.
Navigate to the "example_run" directory. Modify run_atac_rna.sh
shell script to have the correct paths for the variables softwares_dir
(shared location of deindexer, dropEst, STAR, minimap2, and snaptools) and ref_dir
(location of all reference folders - see below).
Go to https://support.10xgenomics.com/single-cell-atac/software/downloads/latest and follow instructions to download GRCh38 (or mm10) reference genome to the ref_dir
directory.
Go to https://support.10xgenomics.com/single-cell-gene-expression/software/downloads/latest and follow instructions to download GRCh38 (or mm10) reference genome to the ref_dir
directory.
Go to https://drive.google.com/drive/folders/1wh6KTd7VqtudZlhEE56JUJ5kd_-_5Irw?usp=sharing to download example fastq files and copy to "example_run/raw_fastq"
To analyse the example fastq files, run: sh run_atac_rna.sh