Protocol for transcriptome assembly by the TransBorrow algorithm

doi:10.1093/biomethods/bpad028

Review

. 2023 Nov 1;8(1):bpad028.

doi: 10.1093/biomethods/bpad028. eCollection 2023.

Protocol for transcriptome assembly by the TransBorrow algorithm

Dengyi Zhao¹, Juntao Liu¹, Ting Yu²

Affiliations

¹ School of Mathematics and Statistics, Shandong University, Weihai 264209, China.
² Research Center for Mathematics and Interdisciplinary Sciences, Shandong University, Qingdao 266237, China.

PMID: 38023349
PMCID: PMC10640700
DOI: 10.1093/biomethods/bpad028

Review

Protocol for transcriptome assembly by the TransBorrow algorithm

Dengyi Zhao et al. Biol Methods Protoc. 2023.

. 2023 Nov 1;8(1):bpad028.

doi: 10.1093/biomethods/bpad028. eCollection 2023.

Authors

Dengyi Zhao¹, Juntao Liu¹, Ting Yu²

Affiliations

¹ School of Mathematics and Statistics, Shandong University, Weihai 264209, China.
² Research Center for Mathematics and Interdisciplinary Sciences, Shandong University, Qingdao 266237, China.

PMID: 38023349
PMCID: PMC10640700
DOI: 10.1093/biomethods/bpad028

Abstract

High-throughput RNA-seq enables comprehensive analysis of the transcriptome for various purposes. However, this technology generally generates massive amounts of sequencing reads with a shorter read length. Consequently, fast, accurate, and flexible tools are needed for assembling raw RNA-seq data into full-length transcripts and quantifying their expression levels. In this protocol, we report TransBorrow, a novel transcriptome assembly software specifically designed for short RNA-seq reads. TransBorrow is employed in conjunction with a splice-aware alignment tool (e.g. Hisat2 and Star) and some other transcriptome assembly tools (e.g. StringTie, Cufflinks, and Scallop). The protocol encompasses all necessary steps, starting from downloading and processing raw sequencing data to assembling the full-length transcripts and quantifying their expressed abundances. The execution time of the protocol may vary depending on the sizes of processed datasets and computational platforms.

Keywords: RNA-seq data; splice variants; transcriptome assembly.

PubMed Disclaimer

Figures

**Figure 1.**
Workflow of transcriptome assembly of RNA-seq experiments with TransBorrow.

**Figure 2.**
The help information of TransBorrow.

**Figure 3.**
A snapshot of 20 lines in the SRR7807492_1.fastq file that stores the sequencing reads. Each four lines contains the information about one sequencing read. The first line begins with a “@” character followed by a sequence identifier and an optional description. The second line contains the raw sequence letters. The third line begins with a “+” character optionally followed by the same sequence identifier. The fourth line records the sequencing quality values for the sequence.

**Figure 4.**
A snapshot of the first several lines in the SRR7807492_genome.sam file that stored the alignment information of RNA-seq reads to the reference genome. Lines starting with “@” are the headers of the file. Each line after the headers records the alignment information of one reads (e.g. query name, flag, reference sequence name, start position, CIGAR string, mate reference sequence name, mate start position, insert size, base sequence of the read, etc.).

**Figure 5.**
A snapshot of several lines in the SRR7807492_TransBorrow.gtf file that stored the assembled transcripts. The file includes the name of the chromosome, the source of the annotation (TransBorrow), the type of the annotation (“transcript” or “exon”), the start and end positions of the annotation; the score or confidence level of the annotation (usually 1000), the direction of the strand of the annotation (with “+” representing the positive strand, “-” representing the negative strand, and “.” indicating no strand information), gene_id (the index of a gene), transcript_id (the index of a transcript), exon_number (the index of exons), cov (estimated coverage), and transcripts Per Million (TPM).

**Figure 6.**
A snapshot of the SRR7807492_TransBorrow.stats file that stored the evaluation results of TransBorrow assembled results for the dataset SRR7807492. According to the output, the number of candidate transcripts was 44,300 and the number of transcripts that correctly match known annotated transcripts was 18,370, with a sensitivity/recall of 20.4% from the locus level and a precision of 41.5% from the transcript level.

**Figure 7.**
Performance comparisons of the assemblers on the SRR7807492 dataset (paired-end and nonstranded). (A) The number of correctly assembled transcripts by the assemblers. (B) Assembly accuracy of the assemblers in terms of precision and recall. (C) F-scores of the assemblers.

**Figure 8.**
Performance comparisons of the assemblers on the ERR3639851data (single-end and nonstranded). (A) The number of correctly assembled transcripts by the assemblers. (B) Assembly accuracy of the assemblers in terms of precision and recall. (C) F-scores of the assemblers.

**Figure 9.**
Performance comparisons of the assemblers on the SRR10611964 data (paired-end and strandspecific). (A) The number of correctly assembled transcripts by the assemblers. (B) Assembly accuracy of the assemblers in terms of precision and recall. (C) F-scores of the assemblers.

See this image and copyright information in PMC

References

1. Yarden K, Wang ET, Airoldi EM. et al. Analysis and design of RNA sequencing experiments for identifying isoform regulation. Nat Methods 2010;7:1009–15. - PMC - PubMed
1. Rory S, Marta G, James H.. RNA sequencing: the teenage years. Nat Rev Genet 2019;20:631–56. - PubMed
1. Marioni JC, Mason CE, Mane SM. et al. RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays. Genome Res 2008;18:1509–17. - PMC - PubMed
1. Ali M, Williams BA, Kenneth M. et al. Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat Methods 2008;5:621–628. - PubMed
1. Teng M, Love MI, Davis CA. et al. A benchmark for RNA-seq quantification pipelines. Genome Biol 2016;17:74. - PMC - PubMed

Publication types

Actions

LinkOut - more resources

Full Text Sources

[1] Yarden K, Wang ET, Airoldi EM. et al. Analysis and design of RNA sequencing experiments for identifying isoform regulation. Nat Methods 2010;7:1009–15. - PMC - PubMed

[2] Yarden K, Wang ET, Airoldi EM. et al. Analysis and design of RNA sequencing experiments for identifying isoform regulation. Nat Methods 2010;7:1009–15. - PMC - PubMed

[3] Rory S, Marta G, James H.. RNA sequencing: the teenage years. Nat Rev Genet 2019;20:631–56. - PubMed

[4] Rory S, Marta G, James H.. RNA sequencing: the teenage years. Nat Rev Genet 2019;20:631–56. - PubMed

[5] Marioni JC, Mason CE, Mane SM. et al. RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays. Genome Res 2008;18:1509–17. - PMC - PubMed

[6] Marioni JC, Mason CE, Mane SM. et al. RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays. Genome Res 2008;18:1509–17. - PMC - PubMed

[7] Ali M, Williams BA, Kenneth M. et al. Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat Methods 2008;5:621–628. - PubMed

[8] Ali M, Williams BA, Kenneth M. et al. Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat Methods 2008;5:621–628. - PubMed

[9] Teng M, Love MI, Davis CA. et al. A benchmark for RNA-seq quantification pipelines. Genome Biol 2016;17:74. - PMC - PubMed

[10] Teng M, Love MI, Davis CA. et al. A benchmark for RNA-seq quantification pipelines. Genome Biol 2016;17:74. - PMC - PubMed

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Protocol for transcriptome assembly by the TransBorrow algorithm

Affiliations

Protocol for transcriptome assembly by the TransBorrow algorithm

Authors

Affiliations

Abstract

Figures

Similar articles

References

Publication types

LinkOut - more resources

Full Text Sources

Abstract

Figures

Similar articles

References

Publication types

Related information

LinkOut - more resources

Full Text Sources