Modeling protein evolution with several amino acid replacement matrices depending on site rates
- PMID: 22491036
- DOI: 10.1093/molbev/mss112
Modeling protein evolution with several amino acid replacement matrices depending on site rates
Abstract
Most protein substitution models use a single amino acid replacement matrix summarizing the biochemical properties of amino acids. However, site evolution is highly heterogeneous and depends on many factors that influence the substitution patterns. In this paper, we investigate the use of different substitution matrices for different site evolutionary rates. Indeed, the variability of evolutionary rates corresponds to one of the most apparent heterogeneity factors among sites, and there is no reason to assume that the substitution patterns remain identical regardless of the evolutionary rate. We first introduce LG4M, which is composed of four matrices, each corresponding to one discrete gamma rate category (of four). These matrices differ in their amino acid equilibrium distributions and in their exchangeabilities, contrary to the standard gamma model where only the global rate differs from one category to another. Next, we present LG4X, which also uses four different matrices, but leaves aside the gamma distribution and follows a distribution-free scheme for the site rates. All these matrices are estimated from a very large alignment database, and our two models are tested using a large sample of independent alignments. Detailed analysis of resulting matrices and models shows the complexity of amino acid substitutions and the advantage of flexible models such as LG4M and LG4X. Both significantly outperform single-matrix models, providing gains of dozens to hundreds of log-likelihood units for most data sets. LG4X obtains substantial gains compared with LG4M, thanks to its distribution-free scheme for site rates. Since LG4M and LG4X display such advantages but require the same memory space and have comparable running times to standard models, we believe that LG4M and LG4X are relevant alternatives to single replacement matrices. Our models, data, and software are available from http://www.atgc-montpellier.fr/models/lg4x.
Similar articles
-
Empirical models for substitution in ribosomal RNA.Mol Biol Evol. 2004 Mar;21(3):419-27. doi: 10.1093/molbev/msh029. Epub 2003 Dec 5. Mol Biol Evol. 2004. PMID: 14660689
-
Phylogenetic mixture models for proteins.Philos Trans R Soc Lond B Biol Sci. 2008 Dec 27;363(1512):3965-76. doi: 10.1098/rstb.2008.0180. Philos Trans R Soc Lond B Biol Sci. 2008. PMID: 18852096 Free PMC article.
-
A protein evolution model with independent sites that reproduces site-specific amino acid distributions from the Protein Data Bank.BMC Evol Biol. 2006 May 31;6:43. doi: 10.1186/1471-2148-6-43. BMC Evol Biol. 2006. PMID: 16737532 Free PMC article.
-
Substitution scoring matrices for proteins - An overview.Protein Sci. 2020 Nov;29(11):2150-2163. doi: 10.1002/pro.3954. Epub 2020 Oct 12. Protein Sci. 2020. PMID: 32954566 Free PMC article. Review.
-
Shifts in amino acid preferences as proteins evolve: A synthesis of experimental and theoretical work.Protein Sci. 2021 Oct;30(10):2009-2028. doi: 10.1002/pro.4161. Epub 2021 Aug 12. Protein Sci. 2021. PMID: 34322924 Free PMC article. Review.
Cited by
-
Reconstruction of cyclooxygenase evolution in animals suggests variable, lineage-specific duplications, and homologs with low sequence identity.J Mol Evol. 2015 Apr;80(3-4):193-208. doi: 10.1007/s00239-015-9670-3. Epub 2015 Mar 11. J Mol Evol. 2015. PMID: 25758350
-
Ant backbone phylogeny resolved by modelling compositional heterogeneity among sites in genomic data.Commun Biol. 2024 Jan 17;7(1):106. doi: 10.1038/s42003-024-05793-7. Commun Biol. 2024. PMID: 38233456 Free PMC article.
-
Testing Phylogenetic Stability with Variable Taxon Sampling.Methods Mol Biol. 2022;2569:167-188. doi: 10.1007/978-1-0716-2691-7_8. Methods Mol Biol. 2022. PMID: 36083448
-
Crystal Structures of the Catalytic Domain of Arabidopsis thaliana Starch Synthase IV, of Granule Bound Starch Synthase From CLg1 and of Granule Bound Starch Synthase I of Cyanophora paradoxa Illustrate Substrate Recognition in Starch Synthases.Front Plant Sci. 2018 Aug 3;9:1138. doi: 10.3389/fpls.2018.01138. eCollection 2018. Front Plant Sci. 2018. PMID: 30123236 Free PMC article.
-
Leishmania guyanensis M4147 as a new LRV1-bearing model parasite: Phosphatidate phosphatase 2-like protein controls cell cycle progression and intracellular lipid content.PLoS Negl Trop Dis. 2022 Jun 24;16(6):e0010510. doi: 10.1371/journal.pntd.0010510. eCollection 2022 Jun. PLoS Negl Trop Dis. 2022. PMID: 35749562 Free PMC article.
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Other Literature Sources