Skip to content

Pipeline for processing CEL-seq and minibulk data from raw reads to count matrix.

License

Notifications You must be signed in to change notification settings

WEHISCORE/CELSeq-pipeline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

40 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CELSeq Pipeline

A snakemake pipeline for processing data generated using the CEL-Seq protocol. Takes BCL or fastq input files and generates a single-cell experiment object using scpipe. Also performs QC using Fastq Screen and FastQC, collated in a MultiQC report. In principle, the pipeline can be used for a range of singe-cell protocols.

Installation

The only prerequisite is snakemake. To install snakemake, you will need to install a Conda-based Python3 distribution. For this, Mambaforge is recommended. Once mamba is installed, snakemake can be installed like so:

mamba create -c conda-forge -c bioconda -n snakemake snakemake

Now activate the snakemake environment (you'll have to do this every time you want to run the pipeline):

conda activate snakemake

Now clone the repository:

git clone https://github.com/WEHISCORE/CELSeq-pipeline.git
cd CELSeq-pipeline

Testing

If you would like to test the pipeline, first download and prepare the test data:

(cd .test && ./download_test_data.sh)
mkdir -p fastq && ln -s $PWD/.test/*fastq.gz fastq

You'll now have to generate a STAR index for the test genome:

mamba install -c conda-forge -c bioconda star=2.7.8a
STAR --runMode genomeGenerate --genomeDir ./test/ERCC92-STAR-index --genomeFastaFiles ./test/ERCC92.fa

And a Bowtie index for FastQ Screen:

mamba install -c conda-forge -c bioconda bowtie2=2.4.2
bowtie2-build .test/ERCC92.fa .test/ERCC-bowtie-index/ERCC

Make sure your config/config.yaml and config/fastq_screen.conf reflect these paths (you can comment out other indexes in the FastQ Screen config). Now run as follows:

snakemake --use-conda --conda-frontend mamba --cores 1

Configuration

The configuration file is found under config/config.yaml and the config file for FastQ Screen is found under config/fastq_screen.conf. Please carefully go through these settings. The main settings to consider will be

  • process_from_bcl -- set this to True only if converting from BCL files. If so, make sure the demultiplexing argument bcl2fastq is set properly (under params).
  • sample_sheet -- this is the sample sheet for bcl2fastq conversion. Please check the bcl2fastq documentation for more info. You can skip this if you're using fastq files.
  • barcode_file -- contains a comma-separated file with an ID column (matching you well/cell IDs) and the corresponding barcode in the following format:
ID,Cell_Barcode
S1,ATATATAT
S2,GCGCGCGC
  • gtf and star_index under ref -- make sure the chromosome names match for these and that you've generated an index for STAR-2.7.8, as this is the version used by the pipeline.
  • read_structure -- ensure barcode_in_r1 is set to TRUE if your barcodes are in R1 (which is standard for CEL-Seq). WEHI's modified CEL-Seq protocol uses a barcode size of 7 (barcode_len_2 default), so set this to 8 if using a standard version of the protocol.

If you are running from BCL, make sure you put your BCL files under the bcl_input directory, and if running from fastqs, put them all under a fastq directory from where you run the pipeline and make sure that your files are in the format fastq/{sample}_R1.fastq.gz and fastq/{sample}_R2.fastq.gz.

Running

Run the pipeline as follows:

conda activate snakemake
snakemake --use-conda --conda-frontend mamba --cores 1

If you want to submit your jobs to the cluster using SLURM, use the following to run the pipeline:

conda activate snakemake
snakemake --use-conda --conda-frontend mamba --profile slurm --jobs 8 --cores 24

The pipeline will generate all results under a results directory. The final output will be under results/sc_demultiplex/{sample}/sce.rds.

About

Pipeline for processing CEL-seq and minibulk data from raw reads to count matrix.

Resources

License

Stars

Watchers

Forks

Packages

No packages published