Abstract
Our understanding of the origins, the functions and/or the structures of biological sequences strongly depends on our ability to decipher the mechanisms of molecular evolution. These complex processes can be described through the comparison of homologous sequences in a phylogenetic framework. Moreover, phylogenetic inference provides sound statistical tools to exhibit the main features of molecular evolution from the analysis of actual sequences. This chapter focuses on phylogenetic tree estimation under the maximum likelihood (ML) principle. Phylogenies inferred under this probabilistic criterion are usually reliable and important biological hypotheses can be tested through the comparison of different models. Estimating ML phylogenies is computationally demanding, and careful examination of the results is warranted. This chapter focuses on PhyML, a software that implements recent ML phylogenetic methods and algorithms. We illustrate the strengths and pitfalls of this program through the analysis of a real data set. PhyML v3.0 is available from http://atgc_montpellier.fr/phyml/.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Felsenstein, J. (1981) Evolutionary trees from DNA sequences: a maximum likelihood approach. J Mol Evol 17, 368–76.
Rogers, J., and Swofford, D. (1999) Multiple local maxima for likelihoods of phylogenetic trees: a simulation study. Mol Biol Evol 16, 1079–85.
Huelsenbeck, J. P., and Hillis, D. (1993) Success of phylogenetic methods in the four-taxon case. Syst Biol 42, 247–64.
Swofford, D., Olsen, G., Waddel, P., and Hillis, D. (1996) Phylogenetic inference. In D. Hillis, C. Moritz, B. Mable, eds., Molecular Systematics, chapter 11. Sinauer, Sunderland, MA.
Guindon, S., and Gascuel, O. (2003) A simple, fast and accurate algorithm to estimate large phylogenies by maximum likelihood. Syst Biol 52, 696–704.
Olsen, G., Matsuda, H., Hagstrom, R ., and Overbeek, R. (1994) fastDNAml: a tool for construction of phylogenetic trees of DNA sequences using maximum likelihood. Comput Appl Biosci 10, 41–8.
Hordijk, W., and Gascuel, O. (2005) Improving the efficiency of SPR moves in phylogenetic tree search methods based on maximum likelihood. Bioinformatics 21, 4338–47.
Anisimova, M., and Gascuel, O. (2006) Approximate likelihood-ratio test for branches: a fast, accurate, and powerful alternative. Syst Biol 55, 539–52.
Shimodaira, H., and Hasegawa, M. (1999) Multiple comparisons of log-likelihoods with applications to phylogenetic inference. Mol Biol Evol, 16, 1114–6.
Jukes, T., and Cantor, C. (1969) Evolution of protein molecules. In H. Munro, ed., Mammalian Protein Metabolism, volume III, chapter 24, 21–132. Academic Press, New York.
Kimura, M. (1980) A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences. J Mol Evol 16, 111–20.
Felsenstein, J. (1993) PHYLIP (PHYLogeny Inference Package) Version 3.6a2. Distributed by the author, Department of Genetics, University of Washington, Seattle.
Hasegawa, M., Kishino, H., and Yano, T. (1985) Dating of the Human-Ape splitting by a molecular clock of mitochondrial-DNA. J Mol Evol 22, 160–74.
Tamura, K., and Nei, M. (1993) Estimation of the number of nucleotide substitutions in the control region of mitochondrial DNA in humans and chimpanzees. Mol Biol Evol 10, 512–26.
Lanave, C., Preparata, G., Saccone, C., and Serio, G. (1984) A new method for calculating evolutionary substitution rates. J Mol Evol 20, 86–93.
Tavaré, S. (1986) Some probabilistic and statistical problems on the analysis of DNA sequences. Lect Mathe Life Sci, 17, 57–86.
Whelan, S., and Goldman, N. (2001) A general empirical model of protein evolution derived from multiple protein families using a maximum-likelihood approach. Mol Biol Evol 18, 691–9.
Dayhoff, M., Schwartz, R., and Orcutt, B. (1978) A model of evolutionary change in proteins. In M. Dayhoff, ed., Atlas of Protein Sequence and Structure, volume 5, 345–52. National Biomedical Research Foundation, Washington, D. C.
Jones, D., Taylor, W., and Thornton, J. (1992) The rapid generation of mutation data matrices from protein sequences. Comput Appl Biosci, 8, 275–82.
Henikoff, S., and Henikoff, J. (1992) Amino acid substitution matrices from protein blocks. Proc Natl Acad Sci USA 89, 10915–9.
Adachi, J., and Hasegawa, M. (1996) MOLPHY version 2.3. programs for molecular phylogenetics based on maximum likelihood. In M. Ishiguro, G. Kitagawa, Y. Ogata, H. Takagi, Y. Tamura, T. Tsuchiya, eds., Computer Science Monographs, 28. The Institute of Statistical Mathematics, Tokyo.
Dimmic, M., Rest, J., Mindell, D., and Goldstein, D. (2002) rtREV: an amino acid substitution matrix for inference of retrovirus and reverse transcriptase phylogeny. J Mol Evol 55, 65–73.
Adachi, J., P., Martin, W., and Hasegawa, M. (2000) Plastid genome phylogeny and a model of amino acid substitution for proteins encoded by chloroplast DNA. J Mol Evol 50, 348–58.
Kosiol, C., and Goldman, N. (2004) Different versions of the Dayhoff rate matrix. Mol Biol and Evol 22, 193–9.
Muller, T., and Vingron, M. (2000) Modeling amino acid replacement. J Comput Biol 7, 761–76.
Cao, Y., Janke, A., Waddell, P., Westerman, M., Takenaka, O., Murata, S., Okada, N., Paabo, S., and Hasegawa, M. (1998) Conflict among individual mitochondrial proteins in resolving the phylogeny of eutherian orders. J Mol Evol 47, 307–22.
Yang, Z. (1994) Maximum likelihood phylogenetic estimation from DNA sequences with variable rates over sites: approximate methods. J Mol Evol 39, 306–14.
Gascuel, O. (1997) BIONJ: an improved version of the NJ algorithm based on a simple model of sequence data. Mol Biol Evol 14, 685–95.
Posada, D., and Crandall, K. (1998) Modeltest: testing the model of DNA substitution. Bioinformatics 14, 817–918.
Abascal, F., Zardoya, R., and Posada, D. (2005) Prottest: selection of best-fit models of protein evolution. Bioinformatics 21, 2104–5.
Galtier, N., and Jean-Marie, A. (2004) Markov-modulated Markov chains and the covarion process of molecular evolution. J Comput Biol, 11, 727–33.
Lin, Y.-H., McLenachan, P., Gore, A., Phillips, M., Ota, R., Hendy, M., and Penny, D. (2002) Four new mitochondrial genomes, and the stability of evolutionary trees of mammals. Mol Biol Evol 19, 2060–70.
Reyes, A., Gissi, C., Catzeflis, F., Nevo, E., Pesole, G., and Saccone, C. (2004) Congruent mammalian trees from mitochondrial and nuclear genes using bayesian methods. Mol Biol Evol 21, 397–403.
Murphy, M., Eizirik, E., O'Brien, S., Madsen, O., Scally, M., Douady, C., Teeling, E., Ryder, O., Stanhope, M., de Jong, W., and Springer, M. (2001) Resolution of the early placental mammal radiation using bayesian phylogenetics. Science 294, 2348–51.
Delsuc, F., Scally, M., Madsen, O., Stanhope, M., de Jong, W., Catzeflis, F., Springer, M., and Douzery, E. (2002) Molecular phylogeny of living xenarthrans and the impact of character and taxon sampling on the placental tree rooting. Mol Biol Evol 19, 1656–71.
Amrine-Madsen, H., Koepfli, K., Wayne, R., and Springer, M. (2003) A new phylogenetic marker, apolipoprotein B, provides compelling evidence for eutherian relationships. Mol Phylogenet Evol 28, 225–40.
Springer, M., Bry, R. D., Douady, C., Amrine, H., Madsen, O., de Jong, W., and Stanhope., M. (2001) Mitochondrial versus nuclear gene sequences in deep-level mammalian phylogeny reconstruction. Mol Biol Evol 18, 132–43.
D'Erchia, A., Gissi, C., Pesole, G., Saccone, C., and Arnason, U. (1996) The guinea-pig is not a rodent. Nature 381, 597–600.
Reyes, A., Pesole, G., and Saccone, C. (1998) Complete mitochondrial DNA sequence of the fat dormouse, Glis glis: further evidence of rodent paraphyly. Mol Biol Evol 15, 499–505.
Reyes, A., Pesole, G., and Saccone, C. (2000) Long-branch attraction phenomenon and the impact of among-site rate variation on rodent phylogeny. Gene 259, 177–87.
Philippe, H. (1997) Rodent monophyly: pitfalls of molecular phylogenies. J Mol Evol 45, 712–5.
Sullivan, J., and Swofford, D. (1997) Are guinea pigs rodents? The importance of adequate models in molecular phylogenetics. J Mammal Evol 4, 77–86.
Felsenstein, J. (1985) Confidence limits on phylogenies: an approach using the bootstrap. Evolution 39, 783–91.
Felsenstein, J., and Churchill, G. (1996) A hidden Markov model approach to variation among sites in rate of evolution. Mol Biol Evol 13, 93–104.
Schniger, M., and von Haesler, A. (1994) A stochastic model for the evolution of autocorrelated DNA sequences. Mol Phylogeny Evol 3, 240–7.
Muse, S. (1995) Evolutionary analyses of DNA sequences subject to constraints on secondary structure. Genetics 139, 1429–39.
Tillier, E., and Collins, R. (1998) High apparent rate of simultaneous compensatory base-pair substitutions in ribosomal rna. Genetics 148, 1993–2002.
Aarts, E., and Lenstra, J. K. (1997) Local Search in Combinatorial Optimization. Wiley, Chichester.
Yang, Z. (1997) PAML : a program package for phylogenetic analysis by maximum likelihood. Comput Appl Biosci 13, 555–6.
Acknowledgements
This work was supported by the “MITOSYS” grant from ANR. The chapter itself is the contribution 2007–08 of the Institut des Sciences de l'Evolution (UMR5554-CNRS).
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Humana Press, a part of Springer Science+Business Media, LLC
About this protocol
Cite this protocol
Guindon, S., Delsuc, F., Dufayard, JF., Gascuel, O. (2009). Estimating Maximum Likelihood Phylogenies with PhyML. In: Posada, D. (eds) Bioinformatics for DNA Sequence Analysis. Methods in Molecular Biology, vol 537. Humana Press. https://doi.org/10.1007/978-1-59745-251-9_6
Download citation
DOI: https://doi.org/10.1007/978-1-59745-251-9_6
Published:
Publisher Name: Humana Press
Print ISBN: 978-1-58829-910-9
Online ISBN: 978-1-59745-251-9
eBook Packages: Springer Protocols