Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Jul 11;114(28):E5616-E5624.
doi: 10.1073/pnas.1704925114. Epub 2017 Jun 26.

Disentangling the effects of selection and loss bias on gene dynamics

Affiliations

Disentangling the effects of selection and loss bias on gene dynamics

Jaime Iranzo et al. Proc Natl Acad Sci U S A. .

Abstract

We combine mathematical modeling of genome evolution with comparative analysis of prokaryotic genomes to estimate the relative contributions of selection and intrinsic loss bias to the evolution of different functional classes of genes and mobile genetic elements (MGE). An exact solution for the dynamics of gene family size was obtained under a linear duplication-transfer-loss model with selection. With the exception of genes involved in information processing, particularly translation, which are maintained by strong selection, the average selection coefficient for most nonparasitic genes is low albeit positive, compatible with observed positive correlation between genome size and effective population size. Free-living microbes evolve under stronger selection for gene retention than parasites. Different classes of MGE show a broad range of fitness effects, from the nearly neutral transposons to prophages, which are actively eliminated by selection. Genes involved in antiparasite defense, on average, incur a fitness cost to the host that is at least as high as the cost of plasmids. This cost is probably due to the adverse effects of autoimmunity and curtailment of horizontal gene transfer caused by the defense systems and selfish behavior of some of these systems, such as toxin-antitoxin and restriction modification modules. Transposons follow a biphasic dynamics, with bursts of gene proliferation followed by decay in the copy number that is quantitatively captured by the model. The horizontal gene transfer to loss ratio, but not duplication to loss ratio, correlates with genome size, potentially explaining increased abundance of neutral and costly elements in larger genomes.

Keywords: antiparasite defense; gene loss; horizontal gene transfer; mobile genetic elements; selection.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Fig. 1.
Fig. 1.
Effective loss bias and mean abundances of gene families from different functional categories. (A) Distribution of the effective duplication/loss ratio d/le. Black horizontal lines indicate the median of each category. Outliers are represented as circles. Designations of the functional categories (modified from ref. 8): C, energy production and conversion; D, cell division; E, amino acid metabolism and transport; F, nucleotide metabolism and transport; G, carbohydrate metabolism and transport; H, coenzyme metabolism; I, lipid metabolism; J, translation; K, transcription; L, replication and repair; M, membrane and cell wall structure and biogenesis; N, secretion and motility; O, posttranslational modification, protein turnover, and chaperone functions; P, inorganic ion transport and metabolism; Q, biosynthesis, transport, and catabolism of secondary metabolites; R, general functional prediction only (typically, prediction of biochemical activity); S, function unknown; T, signal transduction; U, intracellular trafficking and secretion; V, defense mechanisms; Tr, transposon; Pl, conjugative plasmid; and Ph, prophage or phage-related. Two extreme outliers, one from the transposons (transposase IS1595, d/le = 1.4) and one from category V (multidrug efflux pump subunit AcrB, d/le = 1.6), are not represented. (B) Comparison of the global (observed) mean copy number per family and the equilibrium copy number predicted by the model. Data points correspond to medians across functional categories (colors as in A; triangles are used to highlight genetic parasites). Error bars represent the 95% confidence interval for the median. The solid line corresponds to a perfect match between predictions and observations. The Spearman’s correlation coefficients including and excluding parasites are ρ = 0.80 and 0.81, respectively (P<104). (C) Fraction of genomes in which a family is present, compared with the expected fraction at equilibrium (Spearman’s ρ = 0.87 and 0.80, including and excluding parasites, P<104). Data points and error bars as in B.
Fig. 2.
Fig. 2.
Frequency and distribution of proliferation bursts in different functional categories of genes. (A) Orange (left axis) shows frequency of proliferation bursts, defined as the fraction of ATGC-COGs with effective duplication/loss ratio d/le > 1, split by functional category. Gray (right axis) shows mean burst size for these ATGC-COGs. (B) Burst rates in different ATGCs and functional categories, relative to the rate of gene loss. Designations of functional categories are the same as in Fig. 1 and Table 1.
Fig. 3.
Fig. 3.
Correlations between the genome size and potentially relevant parameters of gene family dynamics and genome architecture. Each point represents an ATGC. (A) Total HGT to loss ratio for genes from neutral categories. (B) Duplication to loss ratio for genes from neutral categories (both duplication and loss rates are calculated per copy). (C) Number of ORFan families per genome, which is an independent proxy for h/l. (D) Fraction of ORFan families with more than one copy, which is proportional to d/l. In each panel, the Spearman’s ρ and significant P values are shown; nonsignificant (n.s.) P values are greater than 0.2.
Fig. 4.
Fig. 4.
Comparison between the scaled selection coefficients (s/l) of different functional categories and their characteristic nonsynonymous to synonymous mutation ratios (dN/dS). To account for ATGC-related variation, the dN/dS ratios for all categories within an ATGC were converted into ranks. Circles represent the mean ranks averaged across ATGCs, and error bars represent the SEM. Colors are the same as in Fig. 1. The horizontal gray band shows the theoretical 95% CI for the means of a null model where all categories have similar dN/dS (points above/below this interval indicate that the dN/dS of a category is significantly higher/lower than the expectation under the null model). The trend line (red) was obtained by fitting a monotonic spline curve to the data.
Fig. 5.
Fig. 5.
Effective duplication to loss ratio (d/le) in free-living (FL), facultative host-associated (FHA), and obligate intracellular parasitic (OP) microbes. The designations of functional classes in the x axis are the same as in Fig. 1 and Table 1. The shaded band indicates the 95% CI for the intrinsic d/l estimated from neutral categories and ORFans. Error bars denote the 95% CI for the median d/le.

Similar articles

Cited by

References

    1. Koonin EV. The Logic of Chance: The Nature and Origin of Biological Evolution. FT Press; Upper Saddle River, NJ: 2011.
    1. Lynch M. The Origins of Genome Architecture. Sinauer Associates; Sunderland, MA: 2007.
    1. Koonin EV, Wolf YI. Evolution of microbes and viruses: A paradigm shift in evolutionary biology? Front Cell Infect Microbiol. 2012;2:119. - PMC - PubMed
    1. Moran NA, Bennett GM. The tiniest tiny genomes. Annu Rev Microbiol. 2014;68:195–215. - PubMed
    1. Han K, et al. Extraordinary expansion of a Sorangium cellulosum genome from an alkaline milieu. Sci Rep. 2013;3:2101. - PMC - PubMed

Substances

LinkOut - more resources