Optimal spliced alignments of short sequence reads

doi:10.1093/bioinformatics/btn300

. 2008 Aug 15;24(16):i174-80.

doi: 10.1093/bioinformatics/btn300.

Optimal spliced alignments of short sequence reads

Fabio De Bona¹, Stephan Ossowski, Korbinian Schneeberger, Gunnar Rätsch

Affiliations

PMID: 18689821
DOI: 10.1093/bioinformatics/btn300

Free article

Optimal spliced alignments of short sequence reads

Fabio De Bona et al. Bioinformatics. 2008.

Free article

. 2008 Aug 15;24(16):i174-80.

doi: 10.1093/bioinformatics/btn300.

Authors

Fabio De Bona¹, Stephan Ossowski, Korbinian Schneeberger, Gunnar Rätsch

Affiliation

¹ Friedrich Miescher Laboratory, Max Planck Society, Spemannstr 39, 72076 Tübingen, Germany.

PMID: 18689821
DOI: 10.1093/bioinformatics/btn300

Abstract

Motivation: Next generation sequencing technologies open exciting new possibilities for genome and transcriptome sequencing. While reads produced by these technologies are relatively short and error prone compared to the Sanger method their throughput is several magnitudes higher. To utilize such reads for transcriptome sequencing and gene structure identification, one needs to be able to accurately align the sequence reads over intron boundaries. This represents a significant challenge given their short length and inherent high error rate.

Results: We present a novel approach, called QPALMA, for computing accurate spliced alignments which takes advantage of the read's quality information as well as computational splice site predictions. Our method uses a training set of spliced reads with quality information and known alignments. It uses a large margin approach similar to support vector machines to estimate its parameters to maximize alignment accuracy. In computational experiments, we illustrate that the quality information as well as the splice site predictions help to improve the alignment quality. Finally, to facilitate mapping of massive amounts of sequencing data typically generated by the new technologies, we have combined our method with a fast mapping pipeline based on enhanced suffix arrays. Our algorithms were optimized and tested using reads produced with the Illumina Genome Analyzer for the model plant Arabidopsis thaliana.

Availability: Datasets for training and evaluation, additional results and a stand-alone alignment tool implemented in C++ and python are available at http://www.fml.mpg.de/raetsch/projects/qpalma.

PubMed Disclaimer

Cited by

A survey of sequence alignment algorithms for next-generation sequencing.
Li H, Homer N. Li H, et al. Brief Bioinform. 2010 Sep;11(5):473-83. doi: 10.1093/bib/bbq015. Epub 2010 May 11. Brief Bioinform. 2010. PMID: 20460430 Free PMC article. Review.
Computational Epigenetics: the new scientific paradigm.
Lim SJ, Tan TW, Tong JC. Lim SJ, et al. Bioinformation. 2010 Jan 23;4(7):331-7. doi: 10.6026/97320630004331. Bioinformation. 2010. PMID: 20978607 Free PMC article.
Annotating genomes with massive-scale RNA sequencing.
Denoeud F, Aury JM, Da Silva C, Noel B, Rogier O, Delledonne M, Morgante M, Valle G, Wincker P, Scarpelli C, Jaillon O, Artiguenave F. Denoeud F, et al. Genome Biol. 2008;9(12):R175. doi: 10.1186/gb-2008-9-12-r175. Epub 2008 Dec 16. Genome Biol. 2008. PMID: 19087247 Free PMC article.
TopHat: discovering splice junctions with RNA-Seq.
Trapnell C, Pachter L, Salzberg SL. Trapnell C, et al. Bioinformatics. 2009 May 1;25(9):1105-11. doi: 10.1093/bioinformatics/btp120. Epub 2009 Mar 16. Bioinformatics. 2009. PMID: 19289445 Free PMC article.
Identification of donor splice sites using support vector machine: a computational approach based on positional, compositional and dependency features.
Meher PK, Sahu TK, Rao AR, Wahi SD. Meher PK, et al. Algorithms Mol Biol. 2016 Jun 1;11:16. doi: 10.1186/s13015-016-0078-4. eCollection 2016. Algorithms Mol Biol. 2016. PMID: 27252772 Free PMC article.

See all "Cited by" articles

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Optimal spliced alignments of short sequence reads

Affiliation

Optimal spliced alignments of short sequence reads

Authors

Affiliation

Abstract

Similar articles

Cited by

Publication types

MeSH terms

Substances

LinkOut - more resources

Full Text Sources

Other Literature Sources

Abstract

Similar articles

Cited by

Publication types

MeSH terms

Substances

Related information

LinkOut - more resources

Full Text Sources

Other Literature Sources