Summary
Penny et al. have written that “The most fundamental criterion for a scientific method is that the data must, in principle, be able to reject the model. Hardly any [phylogenetic] tree-reconstruction methods meet this simple requirement.” The ability to reject models is of such great importance because the results of all phylogenetic analyses depend on their underlying models—to have confidence in the inferences, it is necessary to have confidence in the models. In this paper, a test statistics suggested by Cox is employed to test the adequacy of some statistical models of DNA sequence evolution used in the phylogenetic inference method introduced by Felsentein. Monte Carlo simulations are used to assess significance levels. The resulting statistical tests provide an objective and very general assessment of all the components of a DNA substitution model; more specific versions of the test are devised to test individual components of a model. In all cases, the new analyses have the additional advantage that values of phylogenetic parameters do not have to be assumed in order to perform the tests.
Similar content being viewed by others
References
Atkinson AC (1970) A method for discriminating between models. J R Statist Soc B 32:323–345
Avery PJ (1987) The analysis of intron data and their use in the detection of short signals. J Mol Evol 26:335–340
Bailey WJ, Fitch DFA, Tagle DA, Czelusniak J (1991) Molecular evolution of the ψη-globin gene locus: gibbon phylogeny and the hominoid slowdown. Mol Biol Evol 8:155–184
Bartlett MS (1963) The spectral analysis of point processes. J R Statist Soc B 25:264–296
Bishop MJ, Friday AE (1985) Evolutionary trees from nucleic acid and protein sequences. Proc R Soc Lond B 226:271–302
Bross ID (1990) How to eradicate fraudulent statistical methods: statisticians must do science. Biometrics 46:1213–1225
Bulmer M (1987) A statistical analysis of nucleotide sequences in introns and exons in human genes. Mol Biol Evol 4:395–405
Bulmer M (1989) Estimating the variability of substitution rates. Genetics 123:615–619
Cavender JA (1989) Mechanized derivation of linear invariants. Mol Biol Evol 6:301–316
Churchill GA (1989) Stochastic models for heterogeneous DNA sequences. Bull Math Biol 51:79–94
Cox DR (1961) Tests of separate families of hypotheses. Proceedings of the 4th Berkeley Symposium (University of California Press) 1:105–123
Cox DR (1962) Further results on tests of separate families of hypotheses. J R Statist Soc B 24:406–424
Cox DR, Miller HD (1977) The theory of stochastic processes. Chapman and Hall, London, pp 146–198
Dams E, Hendriks L, Van de Peer Y, Neefs JM, Smits G, Vanderbempt I, de Wachter R (1988) Compilation of small subunit RNA subsequences. Nucl Acids Res 16:r87-r174
Edwards AWF (1972) Likelihood. Cambridge University Press, Cambridge, pp 31, 70–102
Efron B (1982) The jackknife, the bootstrap and other resampling plans. Soc Ind Appl Math CBMS-Natl Sci Found Monogr 38
Efron B, Gong G (1983) A leisurely look at the bootstrap, the jackknife, and cross-validation. Am Statistician 37:36–48
Efron B, Tibshirani R (1986) Bootstrap methods for standard errors, confidence intervals, and other measures of statistical accuracy. Stat Sci 1:54–77
Felsenstein J (1973) Maximum likelihood and minimum-steps methods for estimating evolutionary trees from data on discrete characters. Syst Zool 22:240–249
Felsenstein J (1978) The number of evolutionary trees. Syst Zool 27:27–33
Felsenstein J (1981) Evolutionary trees from DNA sequences: a maximum likelihood approach. J Mol Evol 17:368–376
Felsenstein J (1983) Statistical inference of phylogenies. J R Statist Soc A 146:246–272
Felsenstein J (1988) Phylogenies from molecular sequences: inference and reliability. Ann Rev Genet 22:521–565
Felsenstein J (1991a) Counting phylogenetic invariants in some simple cases. J Theor Biol 152:357–376
Felsenstein J (1991b) PHYLIP (Phylogenetic Inference Package) version 3.4, documentation. University of Washington, Seattle
Gillespie JH (1986) Rates of molecular evolution. Ann Rev Ecol Syst 17:637–665
Gillespie JH (1989) Lineage effects and the index of dispersion of molecular evolution. Mol Biol Evol 6:636–647
Goldman N (1990) Maximum likelihood inference of phylogenetic trees, with special reference to a Poisson process model of DNA substitution and to parsimony analyses. Syst Zool 39:345–361
Goldman N (1991) Statistical estimation of phylogenetic trees. PhD Thesis, University of Cambridge, Cambridge, pp 70–73
Hall P, Wilson SR (1991) Two guidelines for bootstrap hypothesis testing. Biometrics 47:757–762
Hasegawa M, Horai S (1991) Time of the deepest root for polymorphism in human mitochondrial DNA. J Mol Evol 32:37–42
Hasegawa M, Iida Y, Yano T, Takaiwa F, Iwabuchi M (1985a) Phylogenetic relationships among eukaryotic kingdoms inferred from ribosomal RNA sequences. J Mol Evol 22:32–38
Hasegawa M, Kishino H, Yano T (1985b) Dating of the human-ape splitting by a molecular clock of mitochondrial DNA. J Mol Evol 22:160–174
Hasegawa M, Kishino H, Yano T (1987) Man's place in Hominoidea as inferred from molecular clocks of DNA. J Mol Evol 26:132–147
Hasegawa M, Kishino H, Yano T (1988) Phylogenetic inference from DNA sequence data. In: Matusita K (ed) Statistical theory and data analysis II. Elsevier, Holland, pp 1–13
Hasegawa M, Kishino H, Yano T (1989) Estimation of branching dates among primates by molecular clocks of nuclear DNA which slowed down in Hominoidea. J Hum Evol 18:461–476
Hasegawa M, Kishino H, Hayasaka K, Horai S (1990) Mitochondrial DNA evolution in primates: transition rate has been extremely low in lemur. J Mol Evol 31:113–121
Hasegawa M, Yano T, Kishino H (1984) A new molecular clock of mitochondrial DNA and the evolution of hominoids. Proc Jpn Acad B 60:95–98
Holmes EC, Pesole G, Saccone C (1989) Stochastic models of molecular evolution and the estimation of phylogeny and rates of nucleotide substitution in the hominoid primates. J Hum Evol 18:775–794
Hope ACA (1968) A simplified Monte Carlo significance test procedure. J R Statist Soc B 30:582–598
Jukes TH, Cantor CR (1969) Evolution of protein molecules. In: Munro HN (ed) Mammalian protein metabolism, vol 3. Academic Press, New York, pp 21–132
Kendall M, Stuart A (1979) The advanced theory of statistics, vol 2. 4th ed. Charles Griffin, London, pp 240–252
Kimura M (1983) The neutral theory of molecular evolution. Cambridge University Press, Cambridge, pp 65–89
Kishino H, Hasegawa M (1989) Evaluation of the maximum likelihood estimate of the evolutionary tree topologies from DNA sequence data, and the branching order in Hominoidea. J Mol Evol 29:170–179
Kishino H, Hasegawa M (1990) Converting distance to time: application to human evolution. Meth Enz 183:550–570
Koop BF, Goodman M, Xu P, Chan K, Slightom JL (1986) Primate eta-globin DNA sequences and man's place among the great apes. Nature 319:234–238
Lake JA (1987) A rate-independent technique for analysis of nucleic acid sequences: evolutionary parsimony. Mol Biol Evol 4:167–191
Lake JA (1988) Origin of the eukaryotic nucleus determined by rate-invariant analysis of rRNA sequences. Nature 331:184–186
Lanave C, Preparata G, Saccone C, Serio G (1984) A new method for calculating evolutionary substitution rates. J Mol Evol 20:86–93
Langley CH, Fitch WM (1974) An examination of the constancy of the rate of molecular evolution. J Mol Evol 3:161–177
Li W-H, Gojobori T, Nei M (1981) Pseudogenes as a paradigm of neutral evolution. Nature 292:237–239
Lindgren BW (1976) Statistical theory. 3rd ed. Macmillan, New York, pp 307–308, 331, 424
Lindsay JK (1974a) Comparison of probability distributions. J R Statist Soc B 36:38–44
Lindsay JK (1974b) Construction and comparison of statistical models. J R Statist Soc B 36:418–425
Lockhart PJ, Penny D, Hendy MD, Howe CJ, Beanland TJ, Larkum AD (1992) Controversy on chloroplast origins. FEBS Lett 301:127–131
Loh W-Y (1985) A new method for testing separate families of hypotheses. J Am Stat Assoc 80:362–368
Maeda N, Wu CI, Bliska J, Reneke J (1988) Molecular evolution of intergenic DNA in higher primates: pattern of DNA changes, molecular clock, and evolution of repetitive sequences. Mol Biol Evol 5:1–20
Marriott FHC (1979) Barnard's Monte Carlo tests: how many simulations? Appl Statist 28:75–77
McCullagh P, Nelder JA (1989) Generalized linear models. 2nd ed. Chapman and Hall, London, pp 119, 174
Navidi WC, Churchill GA, von Haeseler A (1991) Methods for inferring phylogenies from nucleic acid sequence data by using maximum likelihood and linear invariants. Mol Biol Evol 8:128–143
Oliver JL, Marín A, Medina J-R (1989) SDSE: a software package to simulate the evolution of a pair of DNA sequences. CABIOS 5:47–50
Penny D (1982) Towards a basis for classification: the incompleteness of distance measures, incompatibility analysis and phenetic classification. J Theor Biol 96:129–142
Penny D, Hendy MD (1986) Estimating the reliability of evolutionary trees. Mol Biol Evol 3:403–417
Penny D, Hendy MD, Steel MA (1992) Progress with methods for constructing evolutionary trees. TREE 7:73–79
Pesole G, Bozzetti MP, Lanave C, Preparata G, Saccone C (1991) Glutamine synthetase gene evolution: a good molecular clock. Proc Natl Acad Sci USA 88:522–526
Ripley BD (1987) Stochastic simulation. John Wiley and Sons, New York, pp 171–174, 176
Ritland K, Clegg MT (1987) Evolutionary analysis of plant DNA sequences. Am Nat 130:S74-S100
Rodríguez F, Oliver JL, Marín A, Medina JR (1990) The general stochastic model of nucleotide substitution. J Theor Biol 142:485–501
Silvey SD (1975) Statistical inference. Chapman and Hall, London, pp 108–114
Thorne JL, Kishino H, Felsenstein J (1991) An evolutionary model for maximum likelihood alignment of DNA sequences. J Mol Evol 33:114–124 and Erratum, J Mol Evol (1992) 34:91
Thorne JL, Kishino H, Felsenstein J (1992) Inching toward reality: an improved likelihood model of sequence evolution. J Mol Evol 34:3–16
Williams DA (1970) Discrimination between regression models to determine the pattern of enzyme synthesis in synchronous cell cultures. Biometrics 26:23–32
Wilson AC, Carlson SS, White TJ (1977) Biochemical evolution. Ann Rev Biochem 46:573–639
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Goldman, N. Statistical tests of models of DNA substitution. J Mol Evol 36, 182–198 (1993). https://doi.org/10.1007/BF00166252
Received:
Accepted:
Issue Date:
DOI: https://doi.org/10.1007/BF00166252