Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013 Jun 25:1:e89.
doi: 10.7717/peerj.89. Print 2013.

PTree: pattern-based, stochastic search for maximum parsimony phylogenies

Affiliations

PTree: pattern-based, stochastic search for maximum parsimony phylogenies

Ivan Gregor et al. PeerJ. .

Abstract

Phylogenetic reconstruction is vital to analyzing the evolutionary relationship of genes within and across populations of different species. Nowadays, with next generation sequencing technologies producing sets comprising thousands of sequences, robust identification of the tree topology, which is optimal according to standard criteria such as maximum parsimony, maximum likelihood or posterior probability, with phylogenetic inference methods is a computationally very demanding task. Here, we describe a stochastic search method for a maximum parsimony tree, implemented in a software package we named PTree. Our method is based on a new pattern-based technique that enables us to infer intermediate sequences efficiently where the incorporation of these sequences in the current tree topology yields a phylogenetic tree with a lower cost. Evaluation across multiple datasets showed that our method is comparable to the algorithms implemented in PAUP* or TNT, which are widely used by the bioinformatics community, in terms of topological accuracy and runtime. We show that our method can process large-scale datasets of 1,000-8,000 sequences. We believe that our novel pattern-based method enriches the current set of tools and methods for phylogenetic tree inference. The software is available under: http://algbio.cs.uni-duesseldorf.de/webapps/wa-download/.

Keywords: Local search; Maximum parsimony; Phylogeny reconstruction; Stochastic search.

PubMed Disclaimer

Figures

Figure 1
Figure 1. Local tree topology.
A is an internal node, V0 its parent, V1VN are its children, and Si represents the corresponding substitution sets.
Figure 2
Figure 2. Repeated substitution pattern example.
(A) depicts a local tree topology where node A is an internal node that represents DNA sequence (A C A C A), V0 its parent, V1V3 its children, and S0S3 are corresponding substitution sets. The repeated substitution pattern is found at node A since the intersection of at least two substitution sets is non-empty, namely: S0S1 = S0S3 = S1S3 = {C2T} = Y1. Thus, new candidate intermediate node I1 (A T A C A) originates from A (A C A C A) by applying mutations in Y1 = {C2T}. Note that the arrow that goes from node A to its parent V0 does not represent the direction in which the local tree topology is rooted but it serves only for the purpose of the definition of the repeated substitution pattern. (B) depicts an expected local tree topology after intermediate node I1 is added to the tree topology. After I1 is added, the cost of the local tree topology (i.e., the number of substitutions) decreases from 7 to 5.
Figure 3
Figure 3. Topological accuracy.
Comparison of topological accuracies of selected tree building methods as a function of sequence divergence using 5,000 datasets of 40 taxa with 500 bases each from Guindon & Gascuel (2003). PTree was run with 50 iterations and Jukes–Cantor correction enabled. PAUP* was run with the following settings: Neighbor Joining (NJ), maximum parsimony (MP; with the tree bisection and reconnection (TBR) branch swapping option used in the heuristic search), and maximum likelihood (ML; with the nearest neighbor interchange (NNI) branch swapping option used in the heuristic search). TNT was run with the subtree pruning and regrafting (SPR) branch swapping option used in the heuristic search.

Similar articles

References

    1. Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Sayers EW. GenBank. Nucleic Acids Research. 2011;39:32–37. doi: 10.1093/nar/gkq1079. - DOI - PMC - PubMed
    1. Carroll H, Teichert AR, Krein J, Sundberg K, Snell Q, Clement M. An open source phylogenetic search and alignment package. International Journal of Bioinformatics Research and Applications. 2009;5:349–364. doi: 10.1504/IJBRA.2009.026424. - DOI - PubMed
    1. EuResist 2010. HIV reverse transcriptase and polymerase sequences. Available at http://www.euresist.org/web/guest (accessed 12 April 2010)
    1. Felsenstein J. Cases in which parsimony or compatibility methods will be positively misleading. Systematic Zoology. 1978;27:401–410. doi: 10.2307/2412923. - DOI
    1. Felsenstein J. Inferring phylogenies. 2nd ed. Sunderland: Sinauer Associates; 2003.

Grants and funding

I.G., L.S. and A.C.M were funded by the Max-Planck Society and Heinrich-Heine University Düsseldorf. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

LinkOut - more resources