Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Feb 5;14(2):e1005958.
doi: 10.1371/journal.pcbi.1005958. eCollection 2018 Feb.

A phylogenetic method to perform genome-wide association studies in microbes that accounts for population structure and recombination

Affiliations

A phylogenetic method to perform genome-wide association studies in microbes that accounts for population structure and recombination

Caitlin Collins et al. PLoS Comput Biol. .

Abstract

Genome-Wide Association Studies (GWAS) in microbial organisms have the potential to vastly improve the way we understand, manage, and treat infectious diseases. Yet, microbial GWAS methods established thus far remain insufficiently able to capitalise on the growing wealth of bacterial and viral genetic sequence data. Facing clonal population structure and homologous recombination, existing GWAS methods struggle to achieve both the precision necessary to reject spurious findings and the power required to detect associations in microbes. In this paper, we introduce a novel phylogenetic approach that has been tailor-made for microbial GWAS, which is applicable to organisms ranging from purely clonal to frequently recombining, and to both binary and continuous phenotypes. Our approach is robust to the confounding effects of both population structure and recombination, while maintaining high statistical power to detect associations. Thorough testing via application to simulated data provides strong support for the power and specificity of our approach and demonstrates the advantages offered over alternative cluster-based and dimension-reduction methods. Two applications to Neisseria meningitidis illustrate the versatility and potential of our method, confirming previously-identified penicillin resistance loci and resulting in the identification of both well-characterised and novel drivers of invasive disease. Our method is implemented as an open-source R package called treeWAS which is freely available at https://github.com/caitiecollins/treeWAS.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. Evolutionary scenarios detected by treeWAS scores.
The three complementary tests of association in treeWAS assign high scores to different patterns of association, examples of which are illustrated above. Each panel displays the phenotype (left) and the genotype of one associated locus (right), with binary states plotted along the tips of the phylogenetic tree (N = 40) and reconstructed ancestral states indicated along the branches of the tree (blue = 0, red = 1, grey = substitution). A: Score 1 aims to detect association among terminal nodes and assigns a relatively high value of 0.7 to this terminal configuration of phenotypic and genotypic states. B: Score 2 measures association by counting how many branches contain a substitution in both genotype and phenotype, assigning this pattern a score of 5. C: Score 3 is designed to find associations maintained loosely across the phylogenetic tree, resulting in a Score 3 value of 10 to this scenario.
Fig 2
Fig 2. Performance by association test.
The performance on simulated datasets for the six comparator GWAS methods and treeWAS, alongside its three association tests individually, is summarised along the four metrics of evaluation. Box plots display the median and interquartile range, red diamonds indicate the mean, and individual dots represent results for one of the 80 simulated datasets. A: False Positive Rate. B: Sensitivity. C: Positive Predictive Value. D: F1 Score.
Fig 3
Fig 3. Performance by recombination rate.
Interquartile mean performance by GWAS method and recombination rate is plotted along four statistics. A: False Positive Rate. B: Sensitivity. C: Positive Predictive Value. D: F1 Score.
Fig 4
Fig 4. Invasive disease in the N. meningitidis accessory genome.
treeWAS identified 12 genes associated with invasive disease. A: At left, the clonal genealogy reconstructed with ClonalFrameML, and terminal phenotype (blue = carrier, red = invasive). At right, an alignment of the 12 significant genes (blue = gene absence, red = gene presence). B-D: Null distributions of simulated association scores for (B) Score 1, (C) Score 2, (D) Score 3, a significance threshold (red), above which real associated genes are indicated. E-G: Manhattan plots for (E) Score 1, (F) Score 2, (G) Score 3 showing association score values for all genes, a significance threshold (red), above which points indicate significant associations.
Fig 5
Fig 5. Invasive disease in N. meningitidis core SNPs.
treeWAS identified 7 SNPs associated with invasive disease. A: At left, the clonal genealogy reconstructed with ClonalFrameML, and terminal phenotype (blue = carrier, red = invasive). At right, an alignment of the 7 significant SNPs (blue = allele 0; red = allele 1). B-D: Null distributions of simulated association scores for (B) Score 1, (C) Score 2, (D) Score 3, a significance threshold (red), above which real associated SNPs are indicated. E-G: Manhattan plots for (E) Score 1, (F) Score 2, (G) Score 3 showing association score values for all SNPs, a significance threshold (red), above which points indicate significant associations.

Similar articles

Cited by

References

    1. WHO. World Health Statistics Global Health Indicators: Cause-specific mortality and morbidity. World Health Organisation; 2015;p. 72.
    1. Lowder BV, Guinane CM, Ben Zakour NL, Weinert LA, Conway-Morris A, Cartwright RA, et al. Recent human-to-poultry host jump, adaptation, and pandemic spread of Staphylococcus aureus. Proc Natl Acad Sci U S A. 2009. 17 November;106(46):19545–19550. doi: 10.1073/pnas.0909285106 - DOI - PMC - PubMed
    1. Guinane CM, Ben Zakour NL, Tormo-Mas MA, Weinert LA, Lowder BV, Cartwright RA, et al. Evolutionary genomics of Staphylococcus aureus reveals insights into the origin and molecular basis of ruminant host adaptation. Genome Biol Evol. 2010. 12 July;2:454–466. doi: 10.1093/gbe/evq031 - DOI - PMC - PubMed
    1. Kiechle FL, Zhang X, Holland-Staley CA. The -omics era and its impact. Arch Pathol Lab Med. 2004. December;128(12):1337–1345. - PubMed
    1. Holden MTG, Hsu LY, Kurt K, Weinert LA, Mather AE, Harris SR, et al. A genomic portrait of the emergence, evolution, and global spread of a methicillin-resistant Staphylococcus aureus pandemic. Genome Res. 2013. April;23(4):653–664. doi: 10.1101/gr.147710.112 - DOI - PMC - PubMed

Publication types