CELSeq Pipeline

A snakemake pipeline for processing data generated using the CEL-Seq protocol. Takes BCL or fastq input files and generates a single-cell experiment object using scpipe. Also performs QC using Fastq Screen and FastQC, collated in a MultiQC report. In principle, the pipeline can be used for a range of singe-cell protocols.

Installation

The only prerequisite is snakemake. To install snakemake, you will need to install a Conda-based Python3 distribution. For this, Mambaforge is recommended. Once mamba is installed, snakemake can be installed like so:

mamba create -c conda-forge -c bioconda -n snakemake snakemake

Now activate the snakemake environment (you'll have to do this every time you want to run the pipeline):

conda activate snakemake

Now clone the repository:

git clone https://github.com/WEHISCORE/CELSeq-pipeline.git
cd CELSeq-pipeline

Testing

If you would like to test the pipeline, first download and prepare the test data:

(cd .test && ./download_test_data.sh)
mkdir -p fastq && ln -s $PWD/.test/*fastq.gz fastq

You'll now have to generate a STAR index for the test genome:

mamba install -c conda-forge -c bioconda star=2.7.8a
STAR --runMode genomeGenerate --genomeDir ./test/ERCC92-STAR-index --genomeFastaFiles ./test/ERCC92.fa

And a Bowtie index for FastQ Screen:

mamba install -c conda-forge -c bioconda bowtie2=2.4.2
bowtie2-build .test/ERCC92.fa .test/ERCC-bowtie-index/ERCC

Make sure your config/config.yaml and config/fastq_screen.conf reflect these paths (you can comment out other indexes in the FastQ Screen config). Now run as follows:

snakemake --use-conda --conda-frontend mamba --cores 1

Configuration

The configuration file is found under config/config.yaml and the config file for FastQ Screen is found under config/fastq_screen.conf. Please carefully go through these settings. The main settings to consider will be

process_from_bcl -- set this to True only if converting from BCL files. If so, make sure the demultiplexing argument bcl2fastq is set properly (under params).
sample_sheet -- this is the sample sheet for bcl2fastq conversion. Please check the bcl2fastq documentation for more info. You can skip this if you're using fastq files.
barcode_file -- contains a comma-separated file with an ID column (matching you well/cell IDs) and the corresponding barcode in the following format:

ID,Cell_Barcode
S1,ATATATAT
S2,GCGCGCGC

gtf and star_index under ref -- make sure the chromosome names match for these and that you've generated an index for STAR-2.7.8, as this is the version used by the pipeline.
read_structure -- ensure barcode_in_r1 is set to TRUE if your barcodes are in R1 (which is standard for CEL-Seq). WEHI's modified CEL-Seq protocol uses a barcode size of 7 (barcode_len_2 default), so set this to 8 if using a standard version of the protocol.

If you are running from BCL, make sure you put your BCL files under the bcl_input directory, and if running from fastqs, put them all under a fastq directory from where you run the pipeline and make sure that your files are in the format fastq/{sample}_R1.fastq.gz and fastq/{sample}_R2.fastq.gz.

Running

Run the pipeline as follows:

conda activate snakemake
snakemake --use-conda --conda-frontend mamba --cores 1

If you want to submit your jobs to the cluster using SLURM, use the following to run the pipeline:

conda activate snakemake
snakemake --use-conda --conda-frontend mamba --profile slurm --jobs 8 --cores 24

The pipeline will generate all results under a results directory. The final output will be under results/sc_demultiplex/{sample}/sce.rds.

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
.github/workflows		.github/workflows
.test		.test
config		config
slurm		slurm
workflow		workflow
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CELSeq Pipeline

Installation

Testing

Configuration

Running

About

Releases 4

Packages

Languages

License

WEHISCORE/CELSeq-pipeline

Folders and files

Latest commit

History

Repository files navigation

CELSeq Pipeline

Installation

Testing

Configuration

Running

About

Resources

License

Stars

Watchers

Forks

Releases 4

Packages 0

Languages

Packages