Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Apr 22;15(4):e1006534.
doi: 10.1371/journal.pcbi.1006534. eCollection 2019 Apr.

A Bayesian model of acquisition and clearance of bacterial colonization incorporating within-host variation

Affiliations

A Bayesian model of acquisition and clearance of bacterial colonization incorporating within-host variation

Marko Järvenpää et al. PLoS Comput Biol. .

Abstract

Bacterial populations that colonize a host can play important roles in host health, including serving as a reservoir that transmits to other hosts and from which invasive strains emerge, thus emphasizing the importance of understanding rates of acquisition and clearance of colonizing populations. Studies of colonization dynamics have been based on assessment of whether serial samples represent a single population or distinct colonization events. With the use of whole genome sequencing to determine genetic distance between isolates, a common solution to estimate acquisition and clearance rates has been to assume a fixed genetic distance threshold below which isolates are considered to represent the same strain. However, this approach is often inadequate to account for the diversity of the underlying within-host evolving population, the time intervals between consecutive measurements, and the uncertainty in the estimated acquisition and clearance rates. Here, we present a fully Bayesian model that provides probabilities of whether two strains should be considered the same, allowing us to determine bacterial clearance and acquisition from genomes sampled over time. Our method explicitly models the within-host variation using population genetic simulation, and the inference is done using a combination of Approximate Bayesian Computation (ABC) and Markov Chain Monte Carlo (MCMC). We validate the method with multiple carefully conducted simulations and demonstrate its use in practice by analyzing a collection of methicillin resistant Staphylococcus aureus (MRSA) isolates from a large recently completed longitudinal clinical study. An R-code implementation of the method is freely available at: https://github.com/mjarvenpaa/bacterial-colonization-model.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no relevant competing interests exist.

Figures

Fig 1
Fig 1. Illustration of a subset of the data used in the study.
Each row in panel A corresponds to one patient. Only the first 20 patients are shown. R0 is the initial hospital visit and V1, V2, etc. are the subsequent visits. Red colour refers to ST5 and blue to ST8. The numbers show the number of mutations di between consecutive isolates. The samples were obtained from nares swabs and represent single colonies. Dashed black colour highlights cases where the ST changed from ST 5 to ST 8. Panel B shows the frequencies of observed distances di between consecutive samples. These distances were used for model fitting.
Fig 2
Fig 2. Overview of the modeling and data fitting steps.
In Phase 1 we update our prior information on parameters (neff, μ) based on external data D0. In phase 2 we estimate all the parameters of the (mixture) model using MCMC, precomputed distance distributions pS and the information obtained in Phase 1. The fitted model can be used to e.g. obtain the same strain probability for a new (future) measurement.
Fig 3
Fig 3. Outline of the ‘same strain’ and ‘different strain’ models.
Model pD on the left (panel A) represents the situation where the genomes denoted by si1 and si2 are of the same strain. Model pS on the right (panel B) shows the case where these genomes are of different strains. Time flows from left to right in the figures, the dots represent individual genomes, and the edges parent-offspring relationships.
Fig 4
Fig 4. Distributions of pairwise distances for populations simulated with different parameters.
The histograms show the estimated probability mass functions p^sim(di1|neff,μ) with selected parameter vectors (neff, μ). Increasing μ and/or neff tends to increase the distances. Each histogram represent variability in a simulated population at a single time point 6,000 generations after the beginning of the simulation.
Fig 5
Fig 5. ABC posterior distribution for (neff, μ).
The ABC posterior distribution i.e. the updated prior for parameters (neff, μ), the effective population size and mutation rate, given data D0. Panel A shows the result with the full data and panel B the corresponding result with only a subset of the data (see text for details).
Fig 6
Fig 6. Accuracy and consistency with synthetic data.
The first three panels show the estimated posterior distributions for parameters (neff, μ) of the mixture model using simulated data of different sizes N. The green diamond shows the true value used to generate the simulated data and the light grey dots denote the grid point locations needed for numerical computations. The bottom right panel shows the estimated vs. the true ωS parameter in a set of additional simulation experiments.
Fig 7
Fig 7. Results for the Project CLEAR MRSA data.
Contour plot for same strain probability of a distance d* and time interval t* based on the fitted model. The coloured points denote the observations that were used to fit the model. Blue colour indicates large same strain probability. Distances greater than 50 are not shown and are classified as different strains with probability one. 6, 000 generations on the y-axis correspond to approximately one year.
Fig 8
Fig 8. Model validation using posterior predictive checking.
The histogram in the upper left corner shows the observed distance distribution in the Project CLEAR MRSA data, the other figures in the top two rows show the corresponding distances in replicate data sets simulated from the fitted model. The bottom two rows show the same histograms zoomed to range [0, 50]. The replicate data sets look overall similar to the observed data, demonstrating the adequacy of the model. However, the amount of zero distances is underestimated and the frequencies of small positive distances tend to be slightly overestimated.

Similar articles

Cited by

References

    1. Young BC, Golubchik T, Batty EM, Fung R, Larner-Svensson H, Votintseva AA, et al. Evolutionary dynamics of Staphylococcus aureus during progression from carriage to disease. Proceedings of the National Academy of Sciences. 2012;109(12):4550–4555. 10.1073/pnas.1113219109 - DOI - PMC - PubMed
    1. Young BC, Wu CH, Gordon NC, Cole K, Price JR, Liu E, et al. Severe infections emerge from commensal bacteria by adaptive evolution. elife. 2017;6:e30637 10.7554/eLife.30637 - DOI - PMC - PubMed
    1. Gordon N, Pichon B, Golubchik T, Wilson D, Paul J, Blanc D, et al. Whole-genome sequencing reveals the contribution of long-term carriers in Staphylococcus aureus outbreak investigation. Journal of Clinical Microbiology. 2017;55(7):2188–2197. 10.1128/JCM.00363-17 - DOI - PMC - PubMed
    1. Alam MT, Read TD, Petit RA, Boyle-Vavra S, Miller LG, Eells SJ, et al. Transmission and microevolution of USA300 MRSA in US households: evidence from whole-genome sequencing. MBio. 2015;6(2):e00054–15. 10.1128/mBio.00054-15 - DOI - PMC - PubMed
    1. Coll F, Harrison EM, Toleman MS, Reuter S, Raven KE, Blane B, et al. Longitudinal genomic surveillance of MRSA in the UK reveals transmission patterns in hospitals and the community. Science Translational Medicine. 2017;9(413):eaak9745 10.1126/scitranslmed.aak9745 - DOI - PMC - PubMed

Publication types

MeSH terms