Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 Jan;196(1):253-65.
doi: 10.1534/genetics.113.157172. Epub 2013 Oct 30.

Inference of the properties of the recombination process from whole bacterial genomes

Affiliations

Inference of the properties of the recombination process from whole bacterial genomes

M Azim Ansari et al. Genetics. 2014 Jan.

Abstract

Patterns of linkage disequilibrium, homoplasy, and incompatibility are difficult to interpret because they depend on several factors, including the recombination process and the population structure. Here we introduce a novel model-based framework to infer recombination properties from such summary statistics in bacterial genomes. The underlying model is sequentially Markovian so that data can be simulated very efficiently, and we use approximate Bayesian computation techniques to infer parameters. As this does not require us to calculate the likelihood function, the model can be easily extended to investigate less probed aspects of recombination. In particular, we extend our model to account for the bias in the recombination process whereby closely related bacteria recombine more often with one another. We show that this model provides a good fit to a data set of Bacillus cereus genomes and estimate several recombination properties, including the rate of bias in recombination. All the methods described in this article are implemented in a software package that is freely available for download at http://code.google.com/p/clonalorigin/.

Keywords: bacteria; biased recombination; four-gamete test (G4); homologous recombination; homology-dependent recombination; homoplasy; linkage disequilibrium (LD).

PubMed Disclaimer

Figures

Figure 1
Figure 1
Illustration of the recombination model. Consider three recombination events arriving at points b1 = b2 = b3 and departing from points a1, a2, and a3 on the clonal genealogy. In the ClonalOrigin model (free recombination, Equation 2) these three departure points are equally likely because the sums of branch lengths between the times of each bi and ai are equal: L(a1, b1) = L(a2, b2) = L(a3, b3). The amount of evolutionary distance between the donor and recipient cells for the three recombination events is given by D(a1, b1) = 2d1, D(a2, b2) = 2d2, and D(a3, b3) = 2d3. In the biased recombination model (Equation 3), the probability of departing from a1 is higher than that from a2, which is higher than that from a3, because the amount of evolutionary distance between the donor and the recipient cells is increasing: D(a1, b1) < D(a2, b2) < D(a3, b3).
Figure 2
Figure 2
LD and G4 plots for 13 Bacillus cereus whole genomes, as a function of the distance between pairs of sites. LD decreases and G4 increases until they both plateau at ∼1000 bp. The blue circles indicate the three values of LD and G4 that were used as summary statistics in the inference procedure.
Figure 3
Figure 3
Relationship between model parameters and the summary statistics. For a given clonal genealogy (shown in Figure S1), the four model parameters were changed one at a time and the summary statistics were simulated. When unchanged, the parameters were ρs = 0.02, δ = 300, λ = 1.2, and θs = 0.05.
Figure 4
Figure 4
Estimated marginal posterior densities of the parameters for the simulated data set. The values used in simulation are shown in green and are ρs = 0.02, δ = 300, λ = 1.2, and θs = 0.05. The red lines show the uniform prior densities used for the model parameters and the blue histograms show the marginal posterior densities estimated using ABC-MCMC.
Figure 5
Figure 5
Posterior distributions of model parameters for the B. cereus data set. The histograms show the marginal posterior distributions of each parameter whereas the scatter plots show their joint posterior distributions.
Figure 6
Figure 6
Prediction of the future effect of mutation and recombination on the genetic distance between pairs of B. cereus genomes. The heat map at the top indicates the rate at which mutation will increase the distance between all pairs of genomes (i.e., pairwise divergence). The heat map at the bottom indicates the rate at which recombination will decrease these same distances (i.e., pairwise convergence). For closely related isolates, recombination leads to divergence, which is shown as zero convergence. The rate at which mutation causes divergence is an order of magnitude higher than the rate at which recombination leads to convergence. Thus in these isolates, the overall short-term impact of recombination and mutation is divergence of the isolates.

Similar articles

Cited by

References

    1. Achtman M., Wagner M., 2008. Microbial diversity and the genetic nature of microbial species. Nat. Rev. Microbiol. 6: 431–440. - PubMed
    1. Beaumont M. A., 2010. Approximate Bayesian computation in evolution and ecology. Annu. Rev. Ecol. Evol. Syst. 41: 379–406.
    1. Beaumont M. A., Zhang W., Balding D. J., 2002. Approximate Bayesian computation in population genetics. Genetics 162: 2025–2035. - PMC - PubMed
    1. Beaumont M. A., Cornuet J. M., Marin J. M., Robert C. P., 2009. Adaptive approximate Bayesian computation. Biometrika 96: 983–990.
    1. Brockwell A. E., 2006. Parallel Markov chain Monte Carlo simulation by pre-fetching. J. Comput. Graph. Stat. 15: 246–261.

Publication types