A cluster-guided short read de novo transcriptome assembly pipeline, with focus on isoform recovery.
You can install ClusTrAsT in three different manners:
- By add the cloned/downloaded folder to your
PATH
environment variable - By move the files
clustrast
,srClust
andsrClust2
to a folder in yourPATH
variable - Simply running the scripts locally in the cloned/downloaded folder. If you NOT add any of the scripts to your
PATH
variable, you MUST havesrClust
andsrClust2
in the same folder asclustrast
.
And also, you must make sure that all of the following dependencies are available in the PATH
variable:
- shannon_cpp
- minimap2
- cut
- awk
- sed
- python3
- pysam
- isONclust (if no clusterfile is given)
- Trans-ABySS (if no base-assembly is given)
- gzip (if the input files are compressed)
Note: Installing isONclust will also take care of the python3 and pysam dependency.
usage: clustrast -1 FASTX_LEFT_PAIRED_END_READS -2 FASTX_RIGHT_PAIRED_END_READS -o OUTPUT_DIR [-p THREADS] [-u|--uniqify] [-g FASTX_GUIDING_CONTIGS] [-b FASTX_BASE_ASSEMBLY] [-t TEMPARORY_DIRECTORY] [-c CLUSTER_FILE] [--secondary-alignments N_SECONDARY_ALIGNMENTS] [--old-style-sr-clustering]
options:
# Mandatory options:
-1 Short paired ended read file, left end, in FASTx format, compressed or uncompressed
-2 Short paired ended read file, right end, in FASTx format, compressed or uncompressed
-o Ouptut directory
# Non-mandatory options:
-p Maximum number of threads/processes to use when multitasking is possible (one by default)
-t Directory for temparory storage of files (the output directory is used by default)
-b Base assembly in FASTx format (by default, a new such is created with Trans-ABySS)
-g Guiding contigs. (by default, the base assembly will be used for this)
-c Cluster file, in which each line is on the format "CLUSTER_ID GUIDING_CONTIG_NAME". (generated by isONclust by default)
-u Uniqueify transcripts (off by default)
# Other options:
--secondary-alignments The number of secondary alignments for minimap2 to put out (100000 by default).
--old-style-sr-clustering Use the slower srClust (used in paper) instead of the faster srClust2 for the short read clustering.
Note: ALL paths must be absolute paths, when entered as an argument.
To run ClusTrAsT with 15 threads on a paired-ended dataset and a set of CCS reads as guiding contigs:
$ clustrast -1 ~/sr_left.fq.gz -2 ~/sr_right.fq.gz -p 15 -o ~/output_dir -g ~/ccs.fq
ClusTrAsT will here use Trans-ABySS for a base-assembly, isONclust for guiding contig clustering, minimap2+srClust for SR clustering according tho the other clustering, and shannon_cpp for clusterwise assembly.
To run ClusTrAsT the same way but without guiding contigs:
$ clustrast -1 ~/sr_left.fq.gz -2 ~/sr_right.fq.gz -p 15 -o ~/output_dir
Since no guiding contigs were given, ClusTrAsT will use the base assembly for this purpose in this case.
To use an assembly of your own as base assembly:
$ clustrast -1 ~/sr_left.fq.gz -2 ~/sr_right.fq.gz -p 15 -o ~/output_dir -b ~/earlier_assembly.fa
ClusTrAsT will not make a base-assembly with Trans-ABySS in this case, but it will use the earlier_assembly.fa
as guiding contigs.
Karl Johan Westrin, Warren W. Kretzschmar & Olof Emanuelsson: ClusTrast: a short read de novo transcript isoform assembler guided by clustered contigs. BMC Bioinformatics, 25(1):54, Feb 2024.