Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2015 Apr 28:4:e06416.
doi: 10.7554/eLife.06416.

Whole genome comparison of a large collection of mycobacteriophages reveals a continuum of phage genetic diversity

Collaborators, Affiliations
Comparative Study

Whole genome comparison of a large collection of mycobacteriophages reveals a continuum of phage genetic diversity

Welkin H Pope et al. Elife. .

Abstract

The bacteriophage population is large, dynamic, ancient, and genetically diverse. Limited genomic information shows that phage genomes are mosaic, and the genetic architecture of phage populations remains ill-defined. To understand the population structure of phages infecting a single host strain, we isolated, sequenced, and compared 627 phages of Mycobacterium smegmatis. Their genetic diversity is considerable, and there are 28 distinct genomic types (clusters) with related nucleotide sequences. However, amino acid sequence comparisons show pervasive genomic mosaicism, and quantification of inter-cluster and intra-cluster relatedness reveals a continuum of genetic diversity, albeit with uneven representation of different phages. Furthermore, rarefaction analysis shows that the mycobacteriophage population is not closed, and there is a constant influx of genes from other sources. Phage isolation and analysis was performed by a large consortium of academic institutions, illustrating the substantial benefits of a disseminated, structured program involving large numbers of freshman undergraduates in scientific discovery.

Keywords: bacteriophage; evolution; evolutionary biology; genomics; infectious disease; microbiology; viruses.

PubMed Disclaimer

Conflict of interest statement

The authors declare that no competing interests exist.

Figures

Figure 1.
Figure 1.. Geographical distribution of sequenced mycobacteriophages.
(A) Locations of sequenced mycobacteriophages across the globe. (B) Locations of sequenced mycobacteriophages across the United States. Colors and letter designations on the isolates refer to the cluster to which the genomes belong. Data from www.phagesdb.org. DOI: http://dx.doi.org/10.7554/eLife.06416.003
Figure 2.
Figure 2.. Nucleotide sequence comparison of 627 mycobacteriophages displayed as a dotplot.
Complete genome sequences of 627 mycobacteriophages were concatenated into a single file which was compared with itself using Gepard (Krumsiek et al., 2007) and displayed as a dotplot using default parameters (word length, 10). The order of the genomes is as listed in Supplementary file 1. Nucleotide similarity is a primary component in assembling phages into clusters, which typically requires evident DNA similarity spanning more than 50% of the genome lengths. DOI: http://dx.doi.org/10.7554/eLife.06416.004
Figure 2—figure supplement 1.
Figure 2—figure supplement 1.. Dotplot of phages in Clusters I, N, P and the singleton Sparky.
A dotplot was generated using a concatenated file of genome sequences using Gepard (Krumsiek et al., 2007). The complexity of the genome relationships is illustrated by the Cluster I phages which share varying degrees of similarity to phages in Clusters N and P, as well as the singleton Sparky. Because inclusion of a phage in a cluster typically requires sharing a span of similarity over half of the genome lengths, these phages are not assembled into a single larger cluster. DOI: http://dx.doi.org/10.7554/eLife.06416.006
Figure 2—figure supplement 2.
Figure 2—figure supplement 2.. Dotplot of Carcharodon, Che9c, Kheth, and Dori.
The dotplot of concatenated genome sequences illustrates the ambiguity of whether the singleton Dori warrants inclusion in Cluster B. Dori shares DNA sequence similarity with its closest relative Kheth (Subcluster B2), but it does not span 50% of the genome lengths. Dori also shares DNA sequence similarity with Che9c (Cluster I2) and Carcharodon (Cluster N). DOI: http://dx.doi.org/10.7554/eLife.06416.007
Figure 2—figure supplement 3.
Figure 2—figure supplement 3.. Dotplot of Corndog, Brujita, SG4, Yoshi, and MooMoo.
The dotplot of concatenated genome sequences illustrates the complex relationships between the singleton MooMoo and other phages. MooMoo shares DNA sequence similarity with SG4 (Subcluster F1) and Yoshi (Subcluster F2), but also with Brujita (Subcluster I1). MooMoo has barely detectable DNA sequence similarity with Corndog (Cluster O), but has a similar prolate virion morphology. DOI: http://dx.doi.org/10.7554/eLife.06416.008
Figure 3.
Figure 3.. Network phylogeny of 627 mycobacteriophages based on gene content.
Genomes of 627 mycobacteriophages were compared according to shared gene content using the Phamerator (Cresawn et al., 2011) database Mykobacteriophage_627, and displayed using SplitsTree (Huson and Bryant, 2006). Colored circles indicate grouping of phages labeled according to their cluster designations generated by nucleotide sequence comparison (Figure 2); singleton genomes with no close relatives are labeled but not circled. Micrographs show morphotypes of the singleton MooMoo, the Cluster F phage Mozy, and the Cluster O phage Corndog. With the exception of DS6A, all of the phages infect Mycobacterium smegmatis mc2155. DOI: http://dx.doi.org/10.7554/eLife.06416.010
Figure 4.
Figure 4.. Proportions of orphams in mycobacteriophage genomes.
The proportions of genes that are orphams (i.e., single-gene phamilies with no homologues within the mycobacteriophage dataset) are shown for each phage. The order of the phages is as shown in Supplementary file 1. All of the singleton genomes have >30% orphams, and most of the other genomes with relatively high proportions of orphams are the single-genome subclusters (Table 2) including Hawkeye (D2), Myrna (C2), Squirty (F3), Barnyard (H2), Che9c (I2), Whirlwind (L3), Rey (M2), and Purky (P2). Three phages shown in red type are not singletons or single-genome subclusters but have relatively high proportions of orphams. Predator and Mendokysei are members of the diverse and small clusters (five or fewer genomes) H and T, respectively; KayaCho is a member of Subcluster B4 but has a sufficiently high proportion of orphams to arguably warrant formation of a new subcluster, B6. DOI: http://dx.doi.org/10.7554/eLife.06416.012
Figure 4—figure supplement 1.
Figure 4—figure supplement 1.. Shared gene content between Dori, MooMoo, and other mycobacteriophages.
(A) Average percentages of phamilies shared between Dori and other mycobacteriophages. (B) Average percentages of phamilies shared between MooMoo and other mycobacteriophages. Genomes on the x axis are listed in the same order as in Supplementary file 1 and the cluster designations are indicated. DOI: http://dx.doi.org/10.7554/eLife.06416.014
Figure 4—figure supplement 2.
Figure 4—figure supplement 2.. Shared gene content between Gaia, Sparky, and other mycobacteriophages.
(A) Average percentages of phamilies shared between Gaia and other mycobacteriophages. (B) Average percentages of phamilies shared between Sparky and other mycobacteriophages. Genomes on the x axis are listed in the same order as in Supplementary file 1 and the cluster designations are indicated. DOI: http://dx.doi.org/10.7554/eLife.06416.015
Figure 5.
Figure 5.. Heat map representation of shared gene content among 627 mycobacteriophages.
The percentages of pairwise shared genes was determined using a Phamerator (Cresawn et al., 2011) database (Mykobacteriophage_627) populated with 627 completely sequenced phage genomes. The 69,574 genes were assembled into 5205 phamilies (phams) of related sequences using kClust, and the average proportions of shared phams calculated. Genomes are ordered on both axes according to their cluster and subcluster designations (Supplementary file 1) determined by nucleotide sequence similarities (Figure 2). The values (proportions of pairwise shared phams averaged between each partner) are colored as indicated. DOI: http://dx.doi.org/10.7554/eLife.06416.016
Figure 6.
Figure 6.. Cluster diversity and isolation.
(A) The CLuster Averaged Shared Phamilies (CLASP; blue), Cluster Associated Phamilies (CAP; red) and Cluster Cohesion Index (CCI; green) values are plotted for each mycobacteriophage cluster. (B) The Cluster Isolation Index (CII) and CLASP values (both shown as percentages) are plotted for each phage cluster. Singletons (white circles) are not individually labeled but correspond to the values shown in Table 1. DOI: http://dx.doi.org/10.7554/eLife.06416.018
Figure 6—figure supplement 1.
Figure 6—figure supplement 1.. Resampling CLASP values for cluster diversity and size.
CLuster Averaged Shared Phamilies (CLASP) values were calculated for Clusters A, B, C, E, F, and K by resampling random subsets of the genomes. The size of the subsets is shown on the x axis and each point is the average of 20 iterations. The minimum and maximum variations among the iterations are shown. DOI: http://dx.doi.org/10.7554/eLife.06416.020
Figure 6—figure supplement 2.
Figure 6—figure supplement 2.. Cluster diversity shown by Cluster-Associated Phamilies (CAP) and Cluster Phamily Variation (CPV) indices.
The CAP and CPV values are plotted for each cluster. DOI: http://dx.doi.org/10.7554/eLife.06416.021
Figure 7.
Figure 7.. Rarefaction analysis of mycobacteriophage genomes.
(A) The numbers of phamilies are reported for between 1 and 627 phage genomes sampled at random without replacement; the mean of 10,000 iterations is shown in red; gray lines indicate a confidence interval of two standard deviations. The black line shows a hyperbolic curve fit to the data from phage counts 1 to 314. The inset shows the number of new phams encountered upon the inclusion of each phage, with the mean number for the 10,000 iterations shown in blue and the predicted value from the hyperbolic curve shown in black. (B) Rarefaction analysis of 232 Cluster A phages. The total numbers of phamilies are reported for between 1 and 232 phages sampled at random without replacement from Cluster A; the mean of 10,000 iterations is shown in red; gray lines indicate a confidence interval of two standard deviations. The black line shows a hyperbolic curve fit to the data from phage counts 1 to 117. The inset shows the number of new phams encountered upon the inclusion of each phage, with the mean number for 10,000 iterations shown in blue and the predicted value from the hyperbolic curve shown in black. (C) Rarefaction analysis of 108 Cluster B phages; the hyperbolic curve was fit to the data from phage counts 1 to 54. (D) Fits of the hyperbolic (Equation 1) and hyperbolic with linear (Equation 2) models for phamily identification within genome samples. DOI: http://dx.doi.org/10.7554/eLife.06416.023

Similar articles

Cited by

References

    1. Brown KL, Sarkis GJ, Wadsworth C, Hatfull GF. Transcriptional silencing by the mycobacteriophage L5 repressor. The EMBO Journal. 1997;16:5914–5921. doi: 10.1093/emboj/16.19.5914. - DOI - PMC - PubMed
    1. Buckling A, Brockhurst M. Bacteria-virus coevolution. Advances in Experimental Medicine and Biology. 2012;751:347–370. doi: 10.1007/978-1-4614-3567-9_16. - DOI - PubMed
    1. Cresawn SG, Bogel M, Day N, Jacobs-Sera D, Hendrix RW, Hatfull GF. Phamerator: a bioinformatic tool for comparative bacteriophage genomics. BMC Bioinformatics. 2011;12:395. doi: 10.1186/1471-2105-12-395. - DOI - PMC - PubMed
    1. Cresawn SG, Pope WH, Jacobs-Sera D, Bowman CA, Russell DA, Dedrick RA, Adair T, Anders KR, Ball S, Bollivar D, Breitenberger C, Burnett SH, Butela K, Byrnes D, Carzo S, Cornely KA, Cross T, Daniels RL, Dunbar D, Findley AM, Gissendanner CR, Golebiewska UP, Hartzog GA, Hatherill JR, Hughes LE, Jalloh CS, De Los Santos C, Ekanam K, Khambule SL, King RA, King-Smith C, Klyczek K, Krukonis GP, Laing C, Lapin JS, Lopez AJ, Mkhwanazi SM, Molloy SD, Moran D, Munsamy V, Pacey E, Plymale R, Poxleitner M, Reyna N, Schildbach JF, Stukey J, Taylor SE, Ware VC, Wellmann AL, Westholm D, Wodarski D, Zajko M, Zikalala TS, Hendrix RW, Hatfull GF. Comparative genomics of cluster O mycobacteriophages. PLOS ONE. 2015;10:e0118725. doi: 10.1371/journal.pone.0118725. - DOI - PMC - PubMed
    1. Deng L, Ignacio-Espinoza JC, Gregory AC, Poulos BT, Weitz JS, Hugenholtz P, Sullivan MB. Viral tagging reveals discrete populations in Synechococcus viral genome sequence space. Nature. 2014;513:242–245. doi: 10.1038/nature13459. - DOI - PubMed

Publication types

LinkOut - more resources