Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Nov 13;93(23):e01206-19.
doi: 10.1128/JVI.01206-19. Print 2019 Dec 1.

A Puzzling Anomaly in the 4-Mer Composition of the Giant Pandoravirus Genomes Reveals a Stringent New Evolutionary Selection Process

Affiliations

A Puzzling Anomaly in the 4-Mer Composition of the Giant Pandoravirus Genomes Reveals a Stringent New Evolutionary Selection Process

Olivier Poirot et al. J Virol. .

Abstract

Pandoraviridae is a rapidly growing family of giant viruses, all of which have been isolated using laboratory strains of Acanthamoeba The genomes of 10 distinct strains have been fully characterized, reaching up to 2.5 Mb in size. These double-stranded DNA genomes encode the largest of all known viral proteomes and are propagated in oblate virions that are among the largest ever described (1.2 μm long and 0.5 μm wide). The evolutionary origin of these atypical viruses is the object of numerous speculations. Applying the chaos game representation to the pandoravirus genome sequences, we discovered that the tetranucleotide (4-mer) "AGCT" is totally absent from the genomes of 2 strains (Pandoravirus dulcis and Pandoravirus quercus) and strongly underrepresented in others. Given the amazingly low probability of such an observation in the corresponding randomized sequences, we investigated its biological significance through a comprehensive study of the 4-mer compositions of all viral genomes. Our results indicate that AGCT was specifically eliminated during the evolution of the Pandoraviridae and that none of the previously proposed host-virus antagonistic relationships could explain this phenomenon. Unlike the three other families of giant viruses (Mimiviridae, Pithoviridae, and Molliviridae) infecting the same Acanthamoeba host, the pandoraviruses exhibit a puzzling genomic anomaly suggesting a highly specific DNA editing in response to a new kind of strong evolutionary pressure.IMPORTANCE Recent years have seen the discovery of several families of giant DNA viruses infecting the ubiquitous amoebozoa of the genus Acanthamoeba With double-stranded DNA (dsDNA) genomes reaching 2.5 Mb in length packaged in oblate particles the size of a bacterium, the pandoraviruses are currently the most complex and largest viruses known. In addition to their spectacular dimensions, the pandoraviruses encode the largest proportion of proteins without homologs in other organisms, which is thought to result from a de novo gene creation process. While using comparative genomics to investigate the evolutionary forces responsible for the emergence of such an unusual giant virus family, we discovered a unique bias in the tetranucleotide composition of the pandoravirus genomes that can result only from an undescribed evolutionary process not encountered in any other microorganism.

Keywords: 4-mer statistics; DNA editing; Pandoravirus; chaos game representation; genome composition; giant viruses; host-virus relationship.

PubMed Disclaimer

Figures

FIG 1
FIG 1
Phylogenetic structure of the Pandoraviridae. (Adapted from reference .) The number of occurrences of the AGCT 4-mer is indicated for the genome of each strain. The counts are given for one DNA strand and are identical for both strands (AGCT is palindromic).
FIG 2
FIG 2
Chaos game representation of the P. dulcis genome. The largest square left blank (circled in red) corresponds to “AGCT,” indicating the absence of this 4-mer in the genome.
FIG 3
FIG 3
Influence of random sequence length on the number of missing 4-mers. Ten thousand random sequences up to 10,000 bp in size were analyzed. Except for extremely rare fluctuations, no sequence longer than 4,000 bp exhibits a missing 4-mer. Four-mer overlaps as well as nucleotide compositions are taken into account in this analysis.
FIG 4
FIG 4
Distribution of 4-mer frequencies in natural and randomized genome sequences. Top, histogram of the number of distinct 4-mers occurring at various numbers of occurrences in the P. dulcis genome. Bottom, same analysis after randomization.
FIG 5
FIG 5
Missing 4-mers in the largest viral genomes. Except for P. dulcis and P. quercus, the largest viral genomes missing a 4-mers are those of 5 distinct bacteriophages (accession numbers NC_019401, NC_025447, NC_027364, NC_027399, and NC_019526).
FIG 6
FIG 6
Cumulative distribution of AGCT occurrences along the different pandoravirus genomes. The AGCT word is uniformly spread throughout the B-clade pandoravirus genomes, except for a clear rarefaction at the end of the P. neocaledonia genome sequence.
FIG 7
FIG 7
DNA sequence dot plot comparison of P. neocaledonia (horizontal) and P. salinus (vertical). The two genomes exhibit only remnants of collinearity except for the terminal region of P. neocaledonia (red circle), coinciding with a low AGCT density typical of A-clade strains (Fig. 6). The dot plot was generated using GEPARD (35) with the following parameters: word size = 15, window size = 0.
FIG 8
FIG 8
Comparison of the proportions of all 4-mers in P. dulcis (A clade) versus P. neocaledonia (B clade). The 4 most frequent 4-mers are GCGC, CGCG, CGCC, and GGCG.
FIG 9
FIG 9
Digestion of P. neocaledonia DNA at AGCT sites. Lane 1, undigested P. neocaledonia DNA (2.2 Mb) migrating as expected. The bottom band (below 48.5 kb) corresponds to an episome that is not always present. Lane 2, P. neocaledonia DNA digested by the PvuII restriction enzyme (cutting site, cAGCTg). Lane 3, P. neocaledonia DNA digested by the AluI restriction enzyme (cutting site, AGCT). These results demonstrate that the AGCT sites are not protected by modified nucleotides.

Similar articles

Cited by

References

    1. Philippe N, Legendre M, Doutre G, Couté Y, Poirot O, Lescot M, Arslan D, Seltzer V, Bertaux L, Bruley C, Garin J, Claverie JM, Abergel C. 2013. Pandoraviruses: amoeba viruses with genomes up to 2.5 Mb reaching that of parasitic eukaryotes. Science 341:281–286. doi:10.1126/science.1239181. - DOI - PubMed
    1. Legendre M, Fabre E, Poirot O, Jeudy S, Lartigue A, Alempic JM, Beucher L, Philippe N, Bertaux L, Christo-Foroux E, Labadie K, Couté Y, Abergel C, Claverie JM. 2018. Diversity and evolution of the emerging Pandoraviridae family. Nat Commun 9:2285. doi:10.1038/s41467-018-04698-4. - DOI - PMC - PubMed
    1. Legendre M, Alempic JM, Philippe N, Lartigue A, Jeudy S, Poirot O, Ta NT, Nin S, Couté Y, Abergel C, Claverie JM. 2019. Pandoravirus celtis illustrates the microevolution processes at work in the giant Pandoraviridae genomes. Front Microbiol 10:430. doi:10.3389/fmicb.2019.00430. - DOI - PMC - PubMed
    1. Abergel C, Legendre M, Claverie JM. 2015. The rapidly expanding universe of giant viruses: Mimivirus, Pandoravirus, Pithovirus and Mollivirus. FEMS Microbiol Rev 39:779–796. doi:10.1093/femsre/fuv037. - DOI - PubMed
    1. Aherfi S, Andreani J, Baptiste E, Oumessoum A, Dornas FP, Andrade A, Chabriere E, Abrahao J, Levasseur A, Raoult D, La Scola B, Colson P. 2018. A large open pangenome and a small core genome for giant pandoraviruses. Front Microbiol 9:1486. doi:10.3389/fmicb.2018.01486. - DOI - PMC - PubMed

Publication types

LinkOut - more resources