Exploring the extremes of sequence/structure space with ensemble fold recognition in the program Phyre
- PMID: 17876813
- DOI: 10.1002/prot.21688
Exploring the extremes of sequence/structure space with ensemble fold recognition in the program Phyre
Abstract
Structural and functional annotation of the large and growing database of genomic sequences is a major problem in modern biology. Protein structure prediction by detecting remote homology to known structures is a well-established and successful annotation technique. However, the broad spectrum of evolutionary change that accompanies the divergence of close homologues to become remote homologues cannot easily be captured with a single algorithm. Recent advances to tackle this problem have involved the use of multiple predictive algorithms available on the Internet. Here we demonstrate how such ensembles of predictors can be designed in-house under controlled conditions and permit significant improvements in recognition by using a concept taken from protein loop energetics and applying it to the general problem of 3D clustering. We have developed a stringent test that simulates the situation where a protein sequence of interest is submitted to multiple different algorithms and not one of these algorithms can make a confident (95%) correct assignment. A method of meta-server prediction (Phyre) that exploits the benefits of a controlled environment for the component methods was implemented. At 95% precision or higher, Phyre identified 64.0% of all correct homologous query-template relationships, and 84.0% of the individual test query proteins could be accurately annotated. In comparison to the improvement that the single best fold recognition algorithm (according to training) has over PSI-Blast, this represents a 29.6% increase in the number of correct homologous query-template relationships, and a 46.2% increase in the number of accurately annotated queries. It has been well recognised in fold prediction, other bioinformatics applications, and in many other areas, that ensemble predictions generally are superior in accuracy to any of the component individual methods. However there is a paucity of information as to why the ensemble methods are superior and indeed this has never been systematically addressed in fold recognition. Here we show that the source of ensemble power stems from noise reduction in filtering out false positive matches. The results indicate greater coverage of sequence space and improved model quality, which can consequently lead to a reduction in the experimental workload of structural genomics initiatives.
(c) 2007 Wiley-Liss, Inc.
Similar articles
-
Evaluation of PSI-BLAST alignment accuracy in comparison to structural alignments.Protein Sci. 2000 Nov;9(11):2278-84. doi: 10.1110/ps.9.11.2278. Protein Sci. 2000. PMID: 11152139 Free PMC article.
-
Efficient recognition of protein fold at low sequence identity by conservative application of Psi-BLAST: validation.J Mol Recognit. 2005 Mar-Apr;18(2):139-49. doi: 10.1002/jmr.721. J Mol Recognit. 2005. PMID: 15558595
-
Benchmarking PSI-BLAST in genome annotation.J Mol Biol. 1999 Nov 12;293(5):1257-71. doi: 10.1006/jmbi.1999.3233. J Mol Biol. 1999. PMID: 10547299
-
Sequence comparison and protein structure prediction.Curr Opin Struct Biol. 2006 Jun;16(3):374-84. doi: 10.1016/j.sbi.2006.05.006. Epub 2006 May 19. Curr Opin Struct Biol. 2006. PMID: 16713709 Review.
-
Protein folding: from the levinthal paradox to structure prediction.J Mol Biol. 1999 Oct 22;293(2):283-93. doi: 10.1006/jmbi.1999.3006. J Mol Biol. 1999. PMID: 10550209 Review.
Cited by
-
Lactococcal abortive infection protein AbiV interacts directly with the phage protein SaV and prevents translation of phage proteins.Appl Environ Microbiol. 2010 Nov;76(21):7085-92. doi: 10.1128/AEM.00093-10. Epub 2010 Sep 17. Appl Environ Microbiol. 2010. PMID: 20851990 Free PMC article.
-
Comparative analysis of plant genomes allows the definition of the "Phytolongins": a novel non-SNARE longin domain protein family.BMC Genomics. 2009 Nov 4;10:510. doi: 10.1186/1471-2164-10-510. BMC Genomics. 2009. PMID: 19889231 Free PMC article.
-
Incorporation of local structural preference potential improves fold recognition.PLoS One. 2011 Feb 18;6(2):e17215. doi: 10.1371/journal.pone.0017215. PLoS One. 2011. PMID: 21365008 Free PMC article.
-
Localizing the membrane binding region of Group VIA Ca2+-independent phospholipase A2 using peptide amide hydrogen/deuterium exchange mass spectrometry.J Biol Chem. 2009 Aug 28;284(35):23652-61. doi: 10.1074/jbc.M109.021857. Epub 2009 Jun 25. J Biol Chem. 2009. PMID: 19556238 Free PMC article.
-
Genetic analysis of four Pakistani families with achromatopsia and a novel S4 motif mutation of CNGA3.Jpn J Ophthalmol. 2011 Nov;55(6):676-80. doi: 10.1007/s10384-011-0070-y. Epub 2011 Sep 13. Jpn J Ophthalmol. 2011. PMID: 21912902
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Other Literature Sources
Research Materials