A cluster-guided short read de novo transcriptome assembly pipeline, with focus on isoform recovery.
You can install ClusTrAsT in three different manners:
- By add the cloned/downloaded folder to your
environment variable - By move the files
to a folder in yourPATH
variable - Simply running the scripts locally in the cloned/downloaded folder. If you NOT add any of the scripts to your
variable, you MUST havesrClust
in the same folder asclustrast
And also, you must make sure that all of the following dependencies are available in the PATH
- shannon_cpp
- minimap2
- cut
- awk
- sed
- python3
- pysam
- isONclust (if no clusterfile is given)
- Trans-ABySS (if no base-assembly is given)
- gzip (if the input files are compressed)
Note: Installing isONclust will also take care of the python3 and pysam dependency.
# Mandatory options:
-1 Short paired ended read file, left end, in FASTx format, compressed or uncompressed
-2 Short paired ended read file, right end, in FASTx format, compressed or uncompressed
-o Ouptut directory
# Non-mandatory options:
-p Maximum number of threads/processes to use when multitasking is possible (one by default)
-t Directory for temparory storage of files (the output directory is used by default)
-b Base assembly in FASTx format (by default, a new such is created with Trans-ABySS)
-g Guiding contigs. (by default, the base assembly will be used for this)
-c Cluster file, in which each line is on the format "CLUSTER_ID GUIDING_CONTIG_NAME". (generated by isONclust by default)
-u Uniqueify transcripts (off by default)
# Other options:
--secondary-alignments The number of secondary alignments for minimap2 to put out (100000 by default).
--old-style-sr-clustering Use the slower srClust (used in paper) instead of the faster srClust2 for the short read clustering.
Note: ALL paths must be absolute paths, when entered as an argument.
To run ClusTrAsT with 15 threads on a paired-ended dataset and a set of CCS reads as guiding contigs:
$ clustrast -1 ~/sr_left.fq.gz -2 ~/sr_right.fq.gz -p 15 -o ~/output_dir -g ~/ccs.fq
ClusTrAsT will here use Trans-ABySS for a base-assembly, isONclust for guiding contig clustering, minimap2+srClust for SR clustering according tho the other clustering, and shannon_cpp for clusterwise assembly.
To run ClusTrAsT the same way but without guiding contigs:
$ clustrast -1 ~/sr_left.fq.gz -2 ~/sr_right.fq.gz -p 15 -o ~/output_dir
Since no guiding contigs were given, ClusTrAsT will use the base assembly for this purpose in this case.
To use an assembly of your own as base assembly:
$ clustrast -1 ~/sr_left.fq.gz -2 ~/sr_right.fq.gz -p 15 -o ~/output_dir -b ~/earlier_assembly.fa
ClusTrAsT will not make a base-assembly with Trans-ABySS in this case, but it will use the earlier_assembly.fa
as guiding contigs.
Karl Johan Westrin, Warren W. Kretzschmar & Olof Emanuelsson: ClusTrast: a short read de novo transcript isoform assembler guided by clustered contigs. BMC Bioinformatics, 25(1):54, Feb 2024.