A new generation of homology search tools based on probabilistic inference
- PMID: 20180275
A new generation of homology search tools based on probabilistic inference
Abstract
Many theoretical advances have been made in applying probabilistic inference methods to improve the power of sequence homology searches, yet the BLAST suite of programs is still the workhorse for most of the field. The main reason for this is practical: BLAST's programs are about 100-fold faster than the fastest competing implementations of probabilistic inference methods. I describe recent work on the HMMER software suite for protein sequence analysis, which implements probabilistic inference using profile hidden Markov models. Our aim in HMMER3 is to achieve BLAST's speed while further improving the power of probabilistic inference based methods. HMMER3 implements a new probabilistic model of local sequence alignment and a new heuristic acceleration algorithm. Combined with efficient vector-parallel implementations on modern processors, these improvements synergize. HMMER3 uses more powerful log-odds likelihood scores (scores summed over alignment uncertainty, rather than scoring a single optimal alignment); it calculates accurate expectation values (E-values) for those scores without simulation using a generalization of Karlin/Altschul theory; it computes posterior distributions over the ensemble of possible alignments and returns posterior probabilities (confidences) in each aligned residue; and it does all this at an overall speed comparable to BLAST. The HMMER project aims to usher in a new generation of more powerful homology search tools based on probabilistic inference methods.
Similar articles
-
Toward an accurate statistics of gapped alignments.Bull Math Biol. 2005 Jan;67(1):169-91. doi: 10.1016/j.bulm.2004.07.001. Bull Math Biol. 2005. PMID: 15691544
-
Calibrating E-values for hidden Markov models using reverse-sequence null models.Bioinformatics. 2005 Nov 15;21(22):4107-15. doi: 10.1093/bioinformatics/bti629. Epub 2005 Aug 25. Bioinformatics. 2005. PMID: 16123115
-
Accelerated Profile HMM Searches.PLoS Comput Biol. 2011 Oct;7(10):e1002195. doi: 10.1371/journal.pcbi.1002195. Epub 2011 Oct 20. PLoS Comput Biol. 2011. PMID: 22039361 Free PMC article.
-
Fast model-based protein homology detection without alignment.Bioinformatics. 2007 Jul 15;23(14):1728-36. doi: 10.1093/bioinformatics/btm247. Epub 2007 May 8. Bioinformatics. 2007. PMID: 17488755
-
Sequence comparison and protein structure prediction.Curr Opin Struct Biol. 2006 Jun;16(3):374-84. doi: 10.1016/j.sbi.2006.05.006. Epub 2006 May 19. Curr Opin Struct Biol. 2006. PMID: 16713709 Review.
Cited by
-
De novo transcriptome characterization and gene expression profiling of the desiccation tolerant moss Bryum argenteum following rehydration.BMC Genomics. 2015 May 28;16(1):416. doi: 10.1186/s12864-015-1633-y. BMC Genomics. 2015. PMID: 26016800 Free PMC article.
-
SCAPER-Related Autosomal Recessive Retinitis Pigmentosa with Intellectual Disability: Confirming and Extending the Phenotypic Spectrum and Bioinformatics Analyses.Genes (Basel). 2024 Jun 16;15(6):791. doi: 10.3390/genes15060791. Genes (Basel). 2024. PMID: 38927727 Free PMC article.
-
OrysPSSP: a comparative platform for small secreted proteins from rice and other plants.Nucleic Acids Res. 2013 Jan;41(Database issue):D1192-8. doi: 10.1093/nar/gks1090. Epub 2012 Nov 29. Nucleic Acids Res. 2013. PMID: 23203890 Free PMC article.
-
LTR retrotransposons in fungi.PLoS One. 2011;6(12):e29425. doi: 10.1371/journal.pone.0029425. Epub 2011 Dec 29. PLoS One. 2011. PMID: 22242120 Free PMC article.
-
PhyLAT: a phylogenetic local alignment tool.Bioinformatics. 2012 May 15;28(10):1336-44. doi: 10.1093/bioinformatics/bts158. Epub 2012 Apr 6. Bioinformatics. 2012. PMID: 22492645 Free PMC article.
MeSH terms
LinkOut - more resources
Other Literature Sources
Research Materials