GitHub - ChuChuChaddy/ncgas_denovo_transcriptome_pipeline: Development of a semi-automated transcriptome assembly pipeline for NCGAS

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
SOAP		SOAP
TransAbyss		TransAbyss
Trinity		Trinity
Velvet		Velvet
final_assemblies		final_assemblies
input_files		input_files
LICENSE		LICENSE
README		README

Repository files navigation

Welcome to NCGAS's de Novo Transcriptome Workflow v4.0!
-note: this version has added annotation with Trinotate. This is currently incompatible with previous versions of complete assemblies - but stay tuned, a fix for this is in the works!

Getting Started:
Note: Any step that is followed by a "b", etc. (RunVelvet2b.sh or Step 2b) can be run at the same time as the matching number (RunVelvet.sh or Step2).

Step 0:
From the root directory (i.e. Project_Carbonate or ncgas_denovo_transcriptome_pipeline):
You need to put your email as the point of contact for all the scripts. This can be done en masse with the following command (replace youremail@wherever.com with your actual email) from the Project directory:
for f in */Run*; do sed -i 's/YOUREMAILHERE/youremail@wherever.com/g' $f; done

You will also need to map your project directory to the run files. To do this, run the following from the Project directory:
for f in */*; do p=`pwd`; sed -i "s|PWDHERE|$p|g" $f ; done

Step 1:
Put all your reads into input_files
Read the README in input_files to get instructions for combining reads properly into input files.
You can do this with symlink (use command "man ln" if you are unfamiliar with this command).

Then run the normalization command - this will normalize your data and make it take less time/resources without loss of information.
Command: qsub RunTrinity.normalize.sh

Step 2: SOAP
Run RunSOAP1.sh and RunSOAP1b.sh at the same time.
Command: qsub RunSOAP1*
When they finish, run ./Combine.sh
Command: ./Combine.sh

Step 2b: Velvet
Run RunVelvet1.sh and RunVelvet1b.sh at the same time.
Command: qsub RunVelvet1*
When BOTH above are complete, run RunVelvet2.sh and RunVelvet2b.sh at the same time.
Command: qsub RunVelvet2*
When BOTH above are complete, run RunVelvet3.sh and RunVelvet3b.sh at the same time. When they finish, run ./Combine.sh (no need to submit to queue).
Command: qsub RunVelvet3*
When they finish, run ./Combine.sh
Command: ./Combine.sh

Step 2c: TransAbyss
Run RunTransAb1.sh and RunTransAb1b.sh at the same time.
Command: qsub RunTransAbyss1*
When they finish, run ./Combine.sh
Command: ./Combine.sh

Step 2d: Trinity
Run RunTrinity.sh, there is no combine script for this assembler.
Command: qsub RunTrinity.sh

Step 3: Combine all outputs
The outputs for each combined set will be placed automatically in final_assembly.
Run ./Combine.sh FIRST to get one input for Evigenes
Run RunEviGene.sh
Command: ./Combine.sh; qsub RunEviGene.sh

OUTPUT:
In final_assemblies, you will see the following directories:
okayset - where the good files are
dropset - where dropped files are

within okayset, you will set two sets of files:
okay.fa/aa/cds - these are the highest quality transcripts
anything labeled "complete" is a full cds
okalt.fa/aa/cds - these are the alternative versions of the transcripts in the okay file (alleles, isoforms, etc).
anything labeled "complete" is a full cds.

SEE http://arthropods.eugenes.org/EvidentialGene/trassembly.html for documentaiton!

Step 4: Prep for annotation
Once you have completed Step 3, you will see files in the final_Assemblies/annotation folder. See the README in that folder for instructions on how to continue preparing your files for annotation through Trinotate!