Skip to content

A cluster-guided short read de novo transcriptome assembly pipeline.

License

Notifications You must be signed in to change notification settings

karljohanw/clustrast

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

31 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ClusTrAsT

A cluster-guided short read de novo transcriptome assembly pipeline, with focus on isoform recovery.

Installation

You can install ClusTrAsT in three different manners:

  1. By add the cloned/downloaded folder to your PATH environment variable
  2. By move the files clustrast, srClust and srClust2 to a folder in your PATH variable
  3. Simply running the scripts locally in the cloned/downloaded folder. If you NOT add any of the scripts to your PATH variable, you MUST have srClust and srClust2 in the same folder as clustrast.

And also, you must make sure that all of the following dependencies are available in the PATH variable:

Dependencies

Mandatory

  • shannon_cpp
  • minimap2
  • cut
  • awk
  • sed
  • python3
    • pysam

Mandatory for some input

  • isONclust (if no clusterfile is given)
  • Trans-ABySS (if no base-assembly is given)
  • gzip (if the input files are compressed)

Note: Installing isONclust will also take care of the python3 and pysam dependency.

All options

usage: clustrast -1 FASTX_LEFT_PAIRED_END_READS -2 FASTX_RIGHT_PAIRED_END_READS -o OUTPUT_DIR [-p THREADS] [-u|--uniqify] [-g FASTX_GUIDING_CONTIGS] [-b FASTX_BASE_ASSEMBLY] [-t TEMPARORY_DIRECTORY] [-c CLUSTER_FILE] [--secondary-alignments N_SECONDARY_ALIGNMENTS] [--old-style-sr-clustering]
options:
  # Mandatory options:
  -1      Short paired ended read file, left end, in FASTx format, compressed or uncompressed
  -2      Short paired ended read file, right end, in FASTx format, compressed or uncompressed
  -o      Ouptut directory

  # Non-mandatory options:
  -p      Maximum number of threads/processes to use when multitasking is possible (one by default)
  -t      Directory for temparory storage of files (the output directory is used by default)
  -b      Base assembly in FASTx format (by default, a new such is created with Trans-ABySS)
  -g      Guiding contigs. (by default, the base assembly will be used for this)
  -c      Cluster file, in which each line is on the format "CLUSTER_ID GUIDING_CONTIG_NAME". (generated by isONclust by default)
  -u      Uniqueify transcripts (off by default)

  # Other options:
  --secondary-alignments        The number of secondary alignments for minimap2 to put out (100000 by default).
  --old-style-sr-clustering     Use the slower srClust (used in paper) instead of the faster srClust2 for the short read clustering.

Note: ALL paths must be absolute paths, when entered as an argument.

Example usage

To run ClusTrAsT with 15 threads on a paired-ended dataset and a set of CCS reads as guiding contigs:

$ clustrast -1 ~/sr_left.fq.gz -2 ~/sr_right.fq.gz -p 15 -o ~/output_dir -g ~/ccs.fq

ClusTrAsT will here use Trans-ABySS for a base-assembly, isONclust for guiding contig clustering, minimap2+srClust for SR clustering according tho the other clustering, and shannon_cpp for clusterwise assembly.

To run ClusTrAsT the same way but without guiding contigs:

$ clustrast -1 ~/sr_left.fq.gz -2 ~/sr_right.fq.gz -p 15 -o ~/output_dir

Since no guiding contigs were given, ClusTrAsT will use the base assembly for this purpose in this case.

To use an assembly of your own as base assembly:

$ clustrast -1 ~/sr_left.fq.gz -2 ~/sr_right.fq.gz -p 15 -o ~/output_dir -b ~/earlier_assembly.fa

ClusTrAsT will not make a base-assembly with Trans-ABySS in this case, but it will use the earlier_assembly.fa as guiding contigs.

Citation

Karl Johan Westrin, Warren W. Kretzschmar & Olof Emanuelsson: ClusTrast: a short read de novo transcript isoform assembler guided by clustered contigs. BMC Bioinformatics, 25(1):54, Feb 2024.

About

A cluster-guided short read de novo transcriptome assembly pipeline.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published