Microindel detection in short-read sequence data
- PMID: 20144947
- DOI: 10.1093/bioinformatics/btq027
Microindel detection in short-read sequence data
Abstract
Motivation: Several recent studies have demonstrated the effectiveness of resequencing and single nucleotide variant (SNV) detection by deep short-read sequencing platforms. While several reliable algorithms are available for automated SNV detection, the automated detection of microindels in deep short-read data presents a new bioinformatics challenge.
Results: We systematically analyzed how the short-read mapping tools MAQ, Bowtie, Burrows-Wheeler alignment tool (BWA), Novoalign and RazerS perform on simulated datasets that contain indels and evaluated how indels affect error rates in SNV detection. We implemented a simple algorithm to compute the equivalent indel region eir, which can be used to process the alignments produced by the mapping tools in order to perform indel calling. Using simulated data that contains indels, we demonstrate that indel detection works well on short-read data: the detection rate for microindels (<4 bp) is >90%. Our study provides insights into systematic errors in SNV detection that is based on ungapped short sequence read alignments. Gapped alignments of short sequence reads can be used to reduce this error and to detect microindels in simulated short-read data. A comparison with microindels automatically identified on the ABI Sanger and Roche 454 platform indicates that microindel detection from short sequence reads identifies both overlapping and distinct indels.
Contact: peter.krawitz@googlemail.com; peter.robinson@charite.de
Supplementary information: Supplementary data are available at Bioinformatics online.
Similar articles
-
A universal algorithm for de novo decrypting of heterozygous indel sequences: a tool for personalized medicine.Clin Chim Acta. 2008 Mar;389(1-2):7-13. doi: 10.1016/j.cca.2007.11.011. Epub 2007 Nov 23. Clin Chim Acta. 2008. PMID: 18078814
-
Analysis of high-throughput sequencing data.Methods Mol Biol. 2011;678:1-11. doi: 10.1007/978-1-60761-682-5_1. Methods Mol Biol. 2011. PMID: 20931368
-
Correction of sequencing errors in a mixed set of reads.Bioinformatics. 2010 May 15;26(10):1284-90. doi: 10.1093/bioinformatics/btq151. Epub 2010 Apr 8. Bioinformatics. 2010. PMID: 20378555
-
De novo sequencing of plant genomes using second-generation technologies.Brief Bioinform. 2009 Nov;10(6):609-18. doi: 10.1093/bib/bbp039. Brief Bioinform. 2009. PMID: 19933209 Review.
-
Performance optimization in DNA short-read alignment.Bioinformatics. 2022 Apr 12;38(8):2081-2087. doi: 10.1093/bioinformatics/btac066. Bioinformatics. 2022. PMID: 35139149 Free PMC article. Review.
Cited by
-
The allele distribution in next-generation sequencing data sets is accurately described as the result of a stochastic branching process.Nucleic Acids Res. 2012 Mar;40(6):2426-31. doi: 10.1093/nar/gkr1073. Epub 2011 Nov 29. Nucleic Acids Res. 2012. PMID: 22127862 Free PMC article.
-
Performance evaluation of indel calling tools using real short-read data.Hum Genomics. 2015 Aug 19;9(1):20. doi: 10.1186/s40246-015-0042-2. Hum Genomics. 2015. PMID: 26286629 Free PMC article.
-
Computational methodology for ChIP-seq analysis.Quant Biol. 2013 Mar 1;1(1):54-70. doi: 10.1007/s40484-013-0006-2. Quant Biol. 2013. PMID: 25741452 Free PMC article.
-
Low concordance of multiple variant-calling pipelines: practical implications for exome and genome sequencing.Genome Med. 2013 Mar 27;5(3):28. doi: 10.1186/gm432. eCollection 2013. Genome Med. 2013. PMID: 23537139 Free PMC article.
-
Phenotypic and genome-wide analysis of an antibiotic-resistant small colony variant (SCV) of Pseudomonas aeruginosa.PLoS One. 2011;6(12):e29276. doi: 10.1371/journal.pone.0029276. Epub 2011 Dec 15. PLoS One. 2011. PMID: 22195037 Free PMC article.
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Other Literature Sources