Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2014 Oct:27:576-93.
doi: 10.1016/j.meegid.2014.03.025. Epub 2014 Apr 3.

Evolutionary genomics of Borrelia burgdorferi sensu lato: findings, hypotheses, and the rise of hybrids

Affiliations
Review

Evolutionary genomics of Borrelia burgdorferi sensu lato: findings, hypotheses, and the rise of hybrids

Wei-Gang Qiu et al. Infect Genet Evol. 2014 Oct.

Abstract

Borrelia burgdorferi sensu lato (B. burgdorferi s.l.), the group of bacterial species represented by Lyme disease pathogens, has one of the most complex and variable genomic architectures among prokaryotes. Showing frequent recombination within and limited gene flow among geographic populations, the B. burgdorferi s.l. genomes provide an excellent window into the processes of bacterial evolution at both within- and between-population levels. Comparative analyses of B. burgdorferi s.l. genomes revealed a highly dynamic plasmid composition but a conservative gene repertoire. Gene duplication and loss as well as sequence variations at loci encoding surface-localized lipoproteins (e.g., the PF54 genes) are strongly associated with adaptive differences between species. There are a great many conserved intergenic spacer sequences that are candidates for cis-regulatory elements and non-coding RNAs. Recombination among coexisting strains occurs at a rate approximately three times the mutation rate. The coexistence of a large number of genomic groups within local B. burgdorferi s.l. populations may be driven by immune-mediated diversifying selection targeting major antigen loci as well as by adaptation to multiple host species. Questions remain regarding the ecological causes (e.g., climate change, host movements, or new adaptations) of the ongoing range expansion of B. burgdorferi s.l. and on the genomic variations associated with its ecological and clinical variability. Anticipating an explosive growth of the number of B. burgdorferi s.l. genomes sampled from both within and among species, we propose genome-based methods to test adaptive mechanisms and to identify molecular bases of phenotypic variations. Genome sequencing is also necessary for monitoring a likely increase of genetic admixture of previously isolated species and populations in North America and elsewhere.

Keywords: Bacteria evolution; Lyme disease; Phylogenomics; Population genomics; Recombination.

PubMed Disclaimer

Figures

Figure 1
Figure 1. Genome sampling and phylogeny: Borrelia phylogeny
(Main Panel) A phylogeny of 23 sequenced Lyme Group (LD) Borrelia genomes. The tree is based on an alignment of a 6.3-kb region on the cp26 plasmid, encompassing genes b14, b16 oppAIV, b17 (guaA), and, b18 (guaB). Since it does not include ospC, this region is relatively free from recombination and therefore more informative for phylogenetic inference. The cp26 sequences were aligned by using MUGSY (version 1.2.1) (Angiuoli and Salzberg, 2011) in a LINUX environment, the alignment slice was extracted by using customized Perl scripts, and the tree was inferred with FastTree (version 2.1.7) (Enright et al., 2002). The tree is rooted at the midpoint, which is consistent with the root suggested by a tree including Relapsing-Fever Borrelia as outgroups (Inset). The latter tree is based on protein sequences at 24 single-copy conserved loci, including infB (encoding an initiation factor), lepA (encoding a GTP-binding protein), pheS (encoding phenylalanyl-tRNA synthetase subunit alpha), rplB/C/D/E/F/K/N/O/P (encoding 50S ribosomal proteins), and rpsB/C/E/G/H/I/J/K/L/M/Q/S (encoding 30S ribosomal proteins) (Wu and Eisen, 2008) (Norris and Tyagi, personal communications). The protein sequences were aligned by using MUSCLE (version 3.8.31) (Edgar, 2004), the resulting alignments were concatenated by using a customized Perl script, and the tree was inferred with FastTree (version 2.1.7) (Enright et al., 2002). The B. burgdorferi s.l. strains were chosen for whole-genome sequencing to allow for at least three levels of phylogenetic comparisons: (i) between eight B. burgdorferi s.l. species, with the goal of identifying species-specific variations; (ii) between fourteen clonal groups within a single species (B. burgdorferi sensu stricto), with the goal of identifying strain-specific variations; and (iii) between members of sister-group genomes, e.g., within SNP groups A–D (Mongodin et al., 2013), with the goal of identifying most recent genomic changes.
Figure 2
Figure 2. Phylogenomics: Gene duplications and losses on lp54
Parsimony analysis based on the chromosomal SNP tree (left panel, topological diagram) suggests a loss of ospB (dark green) in B. garinii, a bird-adapted species. The PF54 lipoprotein gene array, which consists of paralogs of the host-resistant gene bba68/cspA, evolves rapidly and in a species- and lineage-specific manner. Orthologs, each set of which is shaded in the same color, were determined by reconciliation of a tree of all PF54 homologs with the genome-based phylogeny, as described previously (Wywial et al., 2009). N40_a67a (yellow) orthologs appear to have lost in SNP Group A and in JD1 independently. The B31_a70 (blue, encoding a plasminogen contributing to disabling host complement system) (Koenigs et al., 2013) ortholog appears to be a recently duplicated copy of its neighboring gene a71 and is present only in SNP Group A, which is associated with disseminative Lyme disease (Dykhuizen et al., 2008 Hanincova et al., 2013). ORFs in black have no apparent orthologs in sequenced genomes. Far04_A74* is a merged version of Far04_A73 and Far04_A74 in GenBank. Orthology among PF54 genes provides an independent line of corroborating evidence for the SNP-based phylogeny (Figure 1).
Figure 3
Figure 3. Phylogenomics: positively selected genes
The between-species synonymous (KS) and non-synonymous (Ka) nucleotide substitution rates of genes on the cp26 plasmid (A), the lp54 plasmid (B), and the chromosome (C). In each panel, the blue line indicates the neutral expectation (Ks=Ka). ORFs containing codon sites under positive selection are colored in red and their molecular functions are discussed in Section 4.2. Other genes (colored in green) are predominantly under purifying selection. The lp54 plasmid contains a large proportion of ORFs evolving adaptively between B. burgdorferi s.l. species. The majority of ORFs on cp26 are under strong purifying selection especially the plasmid-partitioning genes b10, b11, b12, and b13, justifying the use of these genes as plasmid markers (Casjens et al., 2012). The high KS values of ORFs on cp26 reflect high effective recombination rates caused by strong balancing selection at ospC. Also shown are within-species synonymous (πS) and non-synonymous (πa) polymorphisms of genes on the cp26 and lp54 plasmids (D). The dashed line indicates the neutral expectation (πA= πS). Two lipoprotein genes on lp54 (a07/chpA1 and a24/dbpA) show highly elevated πS an indication of the roles of their gene products in interacting with the host. Similar inference could be made for two lipoprotein genes on cp26 (b08 and b19/ospC), which also show elevated πA values. Two genes flanking ospC (b18/guaA and b22) show high πS values apparently due to high effective recombination rates at ospC, but are under purifying selection since their πA values are close to normal. All substitution rates were obtained by using the CODEML program of the PAML software package (version 4.4c) (Yang, 2007) and a phylogeny based on genome-wide SNPs (Mongodin et al., 2013). Plots were made using R (R Core Team, 2013; Wickham, 2009). Fifty-eight genes on lp54 are a01, a03–05, a08–16, a18–21, a23-a25, a30-a34, a36-a57, a59-a62, a64–66, a73, a74, and a76. Twenty-six genes on cp26 are b01–14, b16-b19, and b22–29.
Figure 3
Figure 3. Phylogenomics: positively selected genes
The between-species synonymous (KS) and non-synonymous (Ka) nucleotide substitution rates of genes on the cp26 plasmid (A), the lp54 plasmid (B), and the chromosome (C). In each panel, the blue line indicates the neutral expectation (Ks=Ka). ORFs containing codon sites under positive selection are colored in red and their molecular functions are discussed in Section 4.2. Other genes (colored in green) are predominantly under purifying selection. The lp54 plasmid contains a large proportion of ORFs evolving adaptively between B. burgdorferi s.l. species. The majority of ORFs on cp26 are under strong purifying selection especially the plasmid-partitioning genes b10, b11, b12, and b13, justifying the use of these genes as plasmid markers (Casjens et al., 2012). The high KS values of ORFs on cp26 reflect high effective recombination rates caused by strong balancing selection at ospC. Also shown are within-species synonymous (πS) and non-synonymous (πa) polymorphisms of genes on the cp26 and lp54 plasmids (D). The dashed line indicates the neutral expectation (πA= πS). Two lipoprotein genes on lp54 (a07/chpA1 and a24/dbpA) show highly elevated πS an indication of the roles of their gene products in interacting with the host. Similar inference could be made for two lipoprotein genes on cp26 (b08 and b19/ospC), which also show elevated πA values. Two genes flanking ospC (b18/guaA and b22) show high πS values apparently due to high effective recombination rates at ospC, but are under purifying selection since their πA values are close to normal. All substitution rates were obtained by using the CODEML program of the PAML software package (version 4.4c) (Yang, 2007) and a phylogeny based on genome-wide SNPs (Mongodin et al., 2013). Plots were made using R (R Core Team, 2013; Wickham, 2009). Fifty-eight genes on lp54 are a01, a03–05, a08–16, a18–21, a23-a25, a30-a34, a36-a57, a59-a62, a64–66, a73, a74, and a76. Twenty-six genes on cp26 are b01–14, b16-b19, and b22–29.
Figure 4
Figure 4. Phylogenetic footprinting: Putative non-coding RNAs
Predicted secondary structures of six longest putative ncRNAs on cp26 and lp54 plasmids. Each putative ncRNA contains a highly conserved inverted repeat (IR) sequence. These conserved IRs were identified by running all-against-all BLAST (Camacho et al., 2009) among B31 IGS sequences. Only IRs that share 90% or more sequence identity among the eight B. burgdorferi s.l. species (represented by strains B31, SV1, DN127, PBi, PBr, A14S, PKo, and VS116) were retained. Each RNA structure was predicted by using the B31 sequence with the program RNAalifold (Bernhart et al., 2008) and plotted by using VARNA (Darty et al., 2009). Base substitutions (in red) represent variations between species. These predicted structures are supported by patterns of sequence variability, which is enriched within the loop regions and compensatory in some stem regions. The IRa21-a23 has previously been described (Delihas, 2009) and the other five putative ncRNA are new findings.
Figure 5
Figure 5. Population genomics: Selection-driven recombination hotspots
Recombination rates along the cp26 plasmid. Two recombination hotspots are observed, one centered on b08 (encoding an uncharacterized lipoprotein) and the other on ospC. As explained in the text (Section 6.1), these recombination hotspots are most likely due to preferential retainment of amino-acid variations by strong balancing selection and not necessarily due to intrinsic pro-recombination mechanisms. As such, genome-wide recombination analysis of within-population samples helps identify loci under diversifying selection. Recombination rates for all pairs of 951 SNPs on the cp26 plasmids among twelve US B. burgdorferi s.s. genomes (Table 1) were estimated by using LDhat (version 2.2) (McVean et al., 2002). The cp26 sequences were co-linearized according to the B31 sequence (all starting at b01 and ending at b29) and then aligned by using MUGSY (version 1.2.1) (Angiuoli and Salzberg, 2011) in a LINUX environment. The longest alignment was extracted by using a MUGSY-supplied Perl script. SNPs were identified by using the CONVERT program of LDhat. Pair-wise recombination rates were estimated by using the PAIRWISE program of LDhat, which were subsequently plotted in the R statistical computing environment (R Core Team, 2013) with the “summarise.pairwise” R function included in the LDhat package. Recombination hotspots were estimated by using the INTERVAL program of LDhat and plotted with the supplied R function “summarise.interval”.
Figure 6
Figure 6. Genomic phylogeography: A cross-species hybridization
(Top Panel) A codon alignment of the 5’ sequences at the main chromosomal locus 0082 (encoding an uncharacterized conserved protein). B31 sequence is used as the reference and an “.” in other sequences represents the same nucleotide as in the B31 sequence at the same aligned site. Neighboring codons are shaded in alternating colors. The corresponding amino-acid alignment (Bottom Panel) shows coding consequences (synonymous or non-synonymous) of individual nucleotide variations. The N40 sequence at this region displays a large phylogenetic inconsistency: it is more similar to the sequences of non-B. burgdorferi s.s. orthologs than to its con-specific strains. Since all outgroups of B. burgdorferi s.s. used here are distributed exclusively in Europe or Western US, it is likely that the donor is a DN127-like species in Eastern US, such as B. kurtenbachii (Margos et al., 2010). Further analysis identified the two ends of this single cross-species gene-conversion event and showed that it affects a 2.2-kilobase region from the 3’ end of the 0081 locus (encoding a permease) to the 5’ end of the 0084 locus (encoding an aminotransferase). While this gene-conversion event is not obviously adaptive, this analysis shows that (i) there is cross-species hybridization in Northeast USA, (ii) the hybridization may be recent (see Section 7), and (iii) evolutionary genomic analysis is powerful to reveal population history. In this case, signatures of recombination are discovered based on phylogenetically mismatched SNPs (“homoplasies”). The graph was prepared by using BorreliaBase.org, a phylogeny-centered browser designed for evolution-informed comparative analysis of Borrelia genomes (Di et al., 2014). Future studies will screen for genome-wide signatures of species hybridization and population admixture.

Similar articles

Cited by

References

    1. Angiuoli SV, Salzberg SL. Mugsy: fast multiple alignment of closely related whole genomes. Bioinforma. Oxf. Engl. 2011;27:334–342. - PMC - PubMed
    1. Ansari MA, Didelot X. Inference of the properties of the recombination process from whole bacterial genomes. Genetics. 2014;196:253–265. - PMC - PubMed
    1. Avise JC. Phylogeography: The History and Formation of Species. Harvard University Press; 2000.
    1. Barton NH. Linkage and the limits to natural selection. Genetics. 1995;140:821–841. - PMC - PubMed
    1. Barton NH. Why sex and recombination? Cold Spring Harb. Symp. Quant. Biol. 2009;74:187–195. - PubMed

Publication types

LinkOut - more resources