The addgene-bioinformatics repository demonstrates a two step workflow for assembling plasmid next generation sequencing data using Toil.
Download the Docker Desktop package, then install as usual. The Docker daemon should be started by default.
Docker will need to mount the directory Toil creates for a job. When using Docker Desktop on macOS, this mount is accomplished in "Preferences > Resources > File Sharing" by adding "/var/folders" to the list.
Additionally, Docker will need to be allocated enough resources to run a job. This can be done in "Preferences > Resources > Advanced" by setting (minimum) CPUs to 2 and memory to at least 8.0 GB
$ git clone git@github.com:addgene/addgene-bioinformatics.git
Create a virtual environment (using a Python 3 version), optionally update python dependencies, and install the requirements:
pip install pip-tools
pip-compile -U
$ mkvirtualenv addgene-bioinformatics
$ pip install -r requirements.txt
$ cd containers
$ ./build.sh
$ cd src/python/jobs
$ python JobsTest.py
$ python src/python/jobs/SpadesJob.py sjfs
$ python src/python/jobs/ApcJob.py ajfs
$ python src/python/jobs/WellAssemblyJob.py wajfs
$ python src/python/jobs/PlateAssemblyJob.py pajfs
$ python src/python/jobs/WellAssemblyJob.py -s s3 -d addgene-sequencing-data/2018/FASTQ -l A11935X_sW0148 -w A01 wajfs
$ python src/python/jobs/PlateAssemblyJob.py -s s3 -d addgene-sequencing-data/2018/FASTQ -l A11935X_sW0148 pajfs
The following assumes the instructions for preparing your AWS environment have been completed.
$ toil launch-cluster --zone us-east-1a --keyPairName id_rsa --leaderNodeType t2.medium assembly-cluster
$ src/sh/make-archives-for-leader.sh
$ toil rsync-cluster --zone us-east-1a assembly-cluster python.tar.gz :/root
$ toil rsync-cluster --zone us-east-1a assembly-cluster miscellaneous.tar.gz :/root
$ toil ssh-cluster --zone us-east-1a assembly-cluster
# cd
# tar -zxvf python.tar.gz
# tar -zxvf miscellaneous.tar.gz
Login to the cluster leader, run the default plate assembly job on the cluster leader only with a local or S3 file store:
$ toil ssh-cluster --zone us-east-1a assembly-cluster
# cd
# python PlateAssemblyJob.py --data-path miscellaneous --plate-spec A11967B_sW0154 pajfs
# python PlateAssemblyJob.py --data-path miscellaneous --plate-spec A11967B_sW0154 aws:us-east-1:pajfs
Login to the cluster leader, run a well, or sample plate assembly job on the cluster leader only with data imported from S3:
$ toil ssh-cluster --zone us-east-1a assembly-cluster
# cd
# python WellAssemblyJob.py -s s3 -d addgene-sequencing-data/2018/FASTQ -l A11935X_sW0148 -w A01 wajfs
# python PlateAssemblyJob.py -s s3 -d addgene-sequencing-data/2018/FASTQ -l A11935X_sW0148 pajfs
Login to the cluster leader, run the default or a larger plate assembly job using auto-scaling with an S3 file store:
$ toil ssh-cluster --zone us-east-1a assembly-cluster
# cd
# python PlateAssemblyJob.py --data-path miscellaneous --plate-spec A11967B_sW0154 --provisioner aws --nodeTypes c3.large --maxNodes 2 --batchSystem mesos aws:us-east-1:pajfs
# python PlateAssemblyJob.py --data-path miscellaneous --plate-spec A11967A_sW0154 --provisioner aws --nodeTypes c3.large --maxNodes 2 --batchSystem mesos aws:us-east-1:pajfs
$ toil destroy-cluster --zone us-east-1a assembly-cluster