Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011 Oct;28(10):2731-9.
doi: 10.1093/molbev/msr121. Epub 2011 May 4.

MEGA5: molecular evolutionary genetics analysis using maximum likelihood, evolutionary distance, and maximum parsimony methods

Affiliations

MEGA5: molecular evolutionary genetics analysis using maximum likelihood, evolutionary distance, and maximum parsimony methods

Koichiro Tamura et al. Mol Biol Evol. 2011 Oct.

Abstract

Comparative analysis of molecular sequence data is essential for reconstructing the evolutionary histories of species and inferring the nature and extent of selective forces shaping the evolution of genes and species. Here, we announce the release of Molecular Evolutionary Genetics Analysis version 5 (MEGA5), which is a user-friendly software for mining online databases, building sequence alignments and phylogenetic trees, and using methods of evolutionary bioinformatics in basic biology, biomedicine, and evolution. The newest addition in MEGA5 is a collection of maximum likelihood (ML) analyses for inferring evolutionary trees, selecting best-fit substitution models (nucleotide or amino acid), inferring ancestral states and sequences (along with probabilities), and estimating evolutionary rates site-by-site. In computer simulation analyses, ML tree inference algorithms in MEGA5 compared favorably with other software packages in terms of computational efficiency and the accuracy of the estimates of phylogenetic trees, substitution parameters, and rate variation among sites. The MEGA user interface has now been enhanced to be activity driven to make it easier for the use of both beginners and experienced scientists. This version of MEGA is intended for the Windows platform, and it has been configured for effective use on Mac OS X and Linux desktops. It is available free of charge from http://www.megasoftware.net.

PubMed Disclaimer

Figures

F<sc>IG</sc>. 1.
FIG. 1.
Evaluating the fit of substitution models in MEGA5. (A) The “Models” menu in the “Action Bar” provides access to the facility. (B) An “Analysis Preferences” dialog box provides the user with an array of choices, including the choice of tree to use and the method to treat missing data and alignment gaps. In addition to the “Complete Deletion” and “Pairwise Deletion” options, MEGA5 now includes a “Partial Deletion” option that enables users to exclude positions if they have less than a desired percentage (x%) of site coverage, that is, no more than (100−x)% sequences at a site are allowed to have an alignment gap, missing datum, or ambiguous base/amino acid. For protein coding nucleotide sequences, users can choose to analyze nucleotide or translated amino acid substitutions, with a choice of codon positions in the former. (C) The list of evaluated substitution models along with their relative fits, number of parameters (branch lengths + model parameters), and estimates of evolutionary parameters for Drosophila Adh sequence data which are available in the Examples directory in MEGA5 installation. The note below the table provides a brief description of the results (e.g., ranking of models by BIC), data subset selected, and the analysis option chosen. This figure is available in color online and in black and white in print.
F<sc>IG</sc>. 2.
FIG. 2.
Comparison of the best-fit model identified by using automatically generated and true trees for 1,792 computer simulated 66-sequence data sets. (A) The percentage of datasets for which the use of an automatically generated tree produces the same best-fit model as does the use of the true tree. Results are shown from datasets simulated with four different values of the gamma parameter (α) for rate variation among sites. (B) The estimates of α when using the automatically generated trees (filled bars) and the true tree (open bars). The average α and ±1 standard deviation are depicted on each bar; 10 discrete Gamma categories were used. (C) The relationship of true and estimated transition–transversion ratio, R, when using automatically generated trees for data simulated with α = 0.25. The value of R becomes 0.5 when the transition–transversion rate ratio, κ, is 1.0 in Kimura's two-parameter model. The slope of the linear regression was 1.005, with the intercept passing through the origin (r2 = 0.98). Using the true tree, slope and r2 values were 1.007 and 0.98, respectively. The absolute average difference between the two sets of estimates was 0.2% (maximum difference = 5.2%). Similar results were obtained for data simulated with α = 0.5, 1.0, and 2.0.
F<sc>IG</sc>. 3.
FIG. 3.
Comparison of the computational speed of ML heuristic searches. (A) Average time taken to complete MEGA5 (NNI and CNI), RaxML7 (G and MIX), and PhyML3 (NNI and SPR) heuristic searches for 1,792 simulated data sets containing 66 sequences each. Bars are shown with ±1 standard deviation. Three data sets were excluded from PhyML3 calculations, as the NNI search failed. (B, C) Scatter plots showing the time taken to search for the ML tree for alignments that contain 20–200 and 200–765 sequences of 2,000 base pairs. The power trend fits are indicated for PhyML3 and MEGA5 (r2 > 0.98 in all cases). For direct comparisons, all analyses were conducted by using 4 discrete categories for the Gamma distribution and a GTR model of nucleotide substitution (see Materials and Methods for simulation procedures, analysis descriptions, and computer hardware used). G, GTRGAMMA with four discrete Gamma categories; MIX, mixed method of using CAT and GAMMA models.
F<sc>IG</sc>. 4.
FIG. 4.
Accuracies of heuristic ML trees produced by MEGA5, RaxML7, and PhyML3 programs. Shown are the proportions of interior branches (tree partitions) inferred correctly, along with ±1 standard deviation, for simulated data sets containing (A) 66 sequences and (B) 765 sequences. G, GTRGAMMA with four discrete Gamma categories; MIX, mixed method of using CAT and GAMMA models.
F<sc>IG</sc>. 5.
FIG. 5.
Position-specific inferred ancestral states in a primate opsin phylogeny and the posterior probabilities of alternative amino acids at that position. See MEGA5 Examples directory for the data file and Nei and Kumar (2000, p. 212–213) for a description of the data. This figure is available in color online and in black and white in print.
F<sc>IG</sc>. 6.
FIG. 6.
The MEGA5 “Action Bar” and associated action menus. This figure is available in color online and in black and white in print.

Similar articles

Cited by

References

    1. Alfaro ME, Huelsenbeck JP. Comparative performance of Bayesian and AIC-based measures of phylogenetic model uncertainty. Syst Biol. 2006;55:89–96. - PubMed
    1. Battistuzzi FU, Billing-Ross P, Paliwal A, Kumar S. Fast and slow implementations of relaxed clock methods show similar patterns of accuracy in estimating divergence times. Mol Biol Evol. 2011;28:2439–2442. - PMC - PubMed
    1. Edgar RC. MUSCLE: a multiple sequence alignment method with reduced time and space complexity. BMC Bioinformatics. 2004;5:113. - PMC - PubMed
    1. Edwards AWF. Likelihood; an account of the statistical concept of likelihood and its application to scientific inference. Cambridge (UK): Cambridge University Press; 1972.
    1. Felsenstein J. Evolutionary trees from DNA sequences: a maximum likelihood approach. J Mol Evol. 1981;17:368–376. - PubMed

Publication types