Significance
The saga of giant viruses (i.e. visible by light microscopy) started in 2003 with the discovery of Mimivirus. Two additional types of giant viruses infecting Acanthamoeba have been discovered since: the Pandoraviruses (2013) and Pithovirus sibericum (2014), the latter one revived from 30,000-y-old Siberian permafrost. We now describe Mollivirus sibericum, a fourth type of giant virus isolated from the same permafrost sample. These four types of giant virus exhibit different virion structures, sizes (0.6–1.5 µm), genome length (0.6–2.8 Mb), and replication cycles. Their origin and mode of evolution are the subject of conflicting hypotheses. The fact that two different viruses could be easily revived from prehistoric permafrost should be of concern in a context of global warming.
Keywords: giant virus, permafrost, Pleistocene
Abstract
Acanthamoeba species are infected by the largest known DNA viruses. These include icosahedral Mimiviruses, amphora-shaped Pandoraviruses, and Pithovirus sibericum, the latter one isolated from 30,000-y-old permafrost. Mollivirus sibericum, a fourth type of giant virus, was isolated from the same permafrost sample. Its approximately spherical virion (0.6-µm diameter) encloses a 651-kb GC-rich genome encoding 523 proteins of which 64% are ORFans; 16% have their closest homolog in Pandoraviruses and 10% in Acanthamoeba castellanii probably through horizontal gene transfer. The Mollivirus nucleocytoplasmic replication cycle was analyzed using a combination of “omic” approaches that revealed how the virus highjacks its host machinery to actively replicate. Surprisingly, the host’s ribosomal proteins are packaged in the virion. Metagenomic analysis of the permafrost sample uncovered the presence of both viruses, yet in very low amount. The fact that two different viruses retain their infectivity in prehistorical permafrost layers should be of concern in a context of global warming. Giant viruses’ diversity remains to be fully explored.
Following the serendipitous discovery of Mimivirus, the first giant virus with particles large enough to be easily visible under a light microscope (1, 2), systematic surveys were launched to assess the diversity of these spectacular Acanthamoeba-infecting viruses in a planet-wide variety of environments. This led to the discovery and characterization of additional Mimivirus-like viruses now gathered into their own distinct family of DNA viruses, the Mimiviridae. They share a unique external fiber layer enclosing a pseudoicosahedral protein capsid of about 0.5 μm in diameter, itself containing lipid membranes surrounding an electron-dense nucleoid. Their genomes are made of an adenine-thymine A+T-rich linear dsDNA molecule up to 1.26 Mb in length predicted to encode up to 1,120 proteins (3), including a transcription apparatus allowing them to replicate in the host’s cytoplasm (4, 5). The Mimiviridae family is still expanding (6) and diversifying with more distant and smaller representatives (both in terms of particle and genome size) that infect nonamoebal unicellular protists (7–9).
The search for additional Acanthamoeba-infecting viruses led to the discovery of the Marseilleviridae, now a rapidly growing family of large dsDNA viruses with icosahedral particles 0.2 µm in diameter and genome sizes in the 346- to 380-kb range (10–13). The next discovery was that of the spectacular Pandoraviruses isolated from two remote locations, central Chile (Pandoravirus salinus), and Melbourne, Australia (Pandoravirus dulcis) (14). Their amphora-shaped virions are 1.0–1.2 µm in length and 0.5 µm in diameter and exhibit a membrane-bound empty-looking compartment encased into a ∼70-nm-thick tegument-like envelope. Their particles carry a linear G+C-rich dsDNA genome of 2.77 Mb for P. salinus, and 1.93 Mb for P. dulcis. The 2.24-Mb sequence of a third Pandoravirus genome was recently made available [Pandoravirus inopinatum (15)]. These genomes encode a number of predicted proteins comparable to that of the most reduced parasitic unicellular eukaryotes, such as encephalitozoon species (14). In contrast with Mimiviridae, Pandoraviruses’ replication cycle involves (and disrupts) the host nucleus.
Searching for Acanthamoeba-infecting virus in increasingly exotic environments allowed the discovery of Pithovirus sibericum infectious particles, which were recovered from a sample of Late Pleistocene Siberian permafrost (16). Although Pithovirus’s virions looked similar to those of Pandoraviruses both in terms of size and overall shape, further analyses indicated that the two types of viruses were unrelated (16). Pithovirus genome is a much smaller 600-kb circular A+T-rich dsDNA molecule predicted to encode only 467 proteins. In contrast with Pandoravirus, Pithovirus replicates in the host cytoplasm.
From the same permafrost sample, we isolated Mollivirus sibericum, the first representative (to our knowledge) of a fourth type of giant viruses infecting Acanthamoeba. Both transcriptomic and a detailed proteomic time course were used to analyze the infectious cycle of Mollivirus, which appeared markedly different from the previously described viruses infecting the same host. A metagenomic survey was performed to validate the presence and to quantify Pithovirus and Mollivirus in the original permafrost sample. Our results suggest that giant viruses are much more diverse than initially assumed and demonstrate that infectious viral particles with different replications schemes are present in old Siberian permafrost layers.
Results
Particle Morphology.
Mollivirus was initially spotted using light microscopy as rounded particles multiplying in a culture of Acanthamoeba castellanii inoculated with a sample of Siberian permafrost from the Kolyma lowland region (SI Methods). After amplification, the particles were analyzed by transmission electron microscopy (TEM) and scanning electron microscopy. Mollivirus’s roughly spherical particles are 500–600 nm in diameter and appear surrounded by a hairy tegument (Fig. 1A). By thin-section TEM, the particles appear to be surrounded by two to four 25-nm-spaced rings corresponding to fibers of different lengths (Fig. 2D). The tegument is made of at least two layers of different densities and structures. The external layer (10 nm thick) appears to form 30- to 40-nm-interspaced strips tangent to the surface of the particle (Figs. 1B and 2B). The internal layer is 12–14 nm thick and is made of a mesh of fibrils resembling those constituting the central layer of Pandoravirus’s tegument (Fig. 1C) (14). On the surface of the Mollivirus particle, the genome-delivery portal coincides with a circular depression 160–200 nm in diameter (Fig. 1), which could be the consequence of the lack of fiber at the virion apex. At least one internal lipid membrane is delimiting the spacious inner compartment of the virion that is devoid of discernible substructures (Fig. 1 B and C).
Replication Cycle.
Mollivirus replication strategy was documented by following its propagation in axenic A. castellanii cultures over an entire multiplication cycle, starting from purified particles at a very high multiplicity of infection (MOI of 50) to warrant the synchronization of the infection. As for all previously described giant viruses infecting Acanthamoeba, the replication cycle begins with the phagocytosis of Mollivirus particles with up to 10 virions per cell, either distributed in individual vacuoles or gathered in the same vacuole. The opening of the particle was never clearly visualized due to the thickness of the ultrathin sections, larger than the dimension of the genome-delivery funnel. However, the fusion between the virion internal lipid membrane and the vacuole membrane was clearly observed (Fig. 1B). The release of the 5-ethynyl-2′-deoxyuridine (EdU)-labeled Mollivirus viral DNA into the cell cytoplasm and its migration to the nucleus was visualized using fluorescence microscopy (Fig. 3). The Acanthamoeba cells maintained their trophozoite shape and remained adherent throughout the whole cycle. The number of visible vacuoles started to decrease 4–5 h postinfection (PI), and neosynthesized virions appeared in the extracellular medium 6 h PI without exhibiting the cell lysis characterizing previously described giant viruses (2, 14, 16). As for Pandoraviruses, the cell nucleus becomes disorganized with numerous invaginations of the nuclear membrane, but although the nucleus vanishes to be replaced by the Pandoravirus virion factory (VF), the synthesis of new Mollivirus virions occurs at the periphery of the persisting, albeit deformed, nucleus (Fig. 2A). Organelles are excluded from this area, which becomes filled with a mesh of fibrillary structures (Fig. 2 B and C), presumably corresponding to viral proteins composing the particles (Fig. 2D). The nucleus appears also filled with these fibrillary structures, making it an integral part of the Mollivirus VF (Fig. 2A).
The process by which virions are formed is reminiscent of Pandoraviruses, with the envelope and the interior of the Mollivirus particles being synthesized simultaneously (14) (Fig. 2B), but the genome delivery portal apex of Mollivirus particles appears to be formed last instead of first. After 6–8 h, particles at various stages of maturation may coexist in the same VF while mature virions are seen in vacuoles, suggesting that most, if not all, of them are released via exocytosis. Each cell seems to release few hundreds (200–300) new viral particles.
Mollivirus Genomic Features.
Mollivirus genome is a linear dsDNA G+C-rich (60%) molecule of 651,523 bp, including a ∼10-kb-long inverted repeat at each extremity. In contrast to the 610-kb A+T-rich genome of Pithovirus, it is remarkably devoid of internal repeats (Fig. S1), hence making sequence assembly comparatively easier (16).
Protein-coding regions were predicted using Genemark (17), and the limits of the corresponding genes precisely mapped using transcriptome sequencing. Poly-A+–enriched RNA were extracted from Mollivirus-infected Acanthamoeba cells 30 min to 9 h PI, and then used to build three different sequencing libraries by pooling three consecutive times roughly corresponding to the “early” (30 min and 1 and 2 h), “intermediate” (3, 4, and 5 h), and “late” transcripts (6, 7, and 9 h). Most reads (96–98%) could be mapped onto the Mollivirus or Acanthamoeba castellanii (18) genome sequences.
The above analyses identified 523 protein-coding genes (noted ml_#ORF number; ORF, open reading frame) and three tRNAs (LeuTTG, MetATG, TyrTAC). There was no clear signal for the presence of non–protein-coding poly-A+ transcripts and more than 90% of the predicted genes were associated to RNA-seq coverage values higher than that of the intergenic regions (Fig. S2). The total protein-coding moiety corresponds to 82.2% of the genome, 8.2% to the short 5′- and 3′-untranslated regions and 9.6% to the intergenic regions (120 nt long in average). The analysis of these intergenic regions did not reveal enriched sequence motifs that might indicate a conserved promoter signals. This negative finding is consistent with the lack of stringent transcriptional regulation exhibited by most Mollivirus genes, the transcripts of which were detected, albeit at various levels, in the early, intermediate, and late mRNA pools.
The mapping of the RNA-seq reads on the Mollivirus genome sequence pointed out the presence of short (87.6 ± 8.6 nt; min, 59; max, 159; median, 84) spliceosomal introns delimited by the canonical 5′-GT–3′-AG rule in 21 (4%) of the 523 protein-coding genes, evenly distributed along the genome. With the exception of ml_476 (with four introns of 87, 59, 70, and 83 nt) and ml_89 (with three introns of 67, 75, and 67 nt), there is a single intron per gene. All correspond to transcripts remaining detectable in the intermediate and late mRNA pools (Table S1), and two introns-containing genes (ml_476, ml_320) correspond to proteins found in the virion, suggesting that the host’s spliceosome remains functional throughout the entire replication cycle, despite the morphological changes exhibited by the nucleus. Alternatively, the viral mRNAs may be stable enough to remain present all along the replicative cycle.
The sequences of the Mollivirus predicted proteins were analyzed using BLAST against the nonredundant protein sequence database (National Center for Biotechnology Information) (19) and a combination of motif search and protein-fold recognition methods [as previously described (16)]. As it is customary upon discovery of the first member of a previously unknown virus group, the proportion of Mollivirus protein without a recognizable homolog was high (337/523 = 64.4%). Among the Mollivirus proteins with homolog in the databases, 93 (17.8%) were most similar to a virus protein, 50 (9.6%) to an A. castellanii protein, 22 (4.2%) to proteins of other eukaryotes, and 18 (3.4%) to prokaryotic proteins (Fig. 4).
In contrast to Pithovirus (also the unique representative of its kind), a highly dominant proportion (83/93 = 89.2%) of the viral best matches to Mollivirus proteins correspond to a single known virus group, the Pandoraviruses (14). However, the Pandoravirus homologs exhibit low sequence similarities (identity, 40.4 ± 2.8%; median, 37.7%). Fifty-one (61.4%) of them display a recognizable functional attribute, including 11 proteins containing generic conserved domains such as ankyrin repeats, BTB/POZ domains, or zinc fingers. Among the proteins with more specific functional predictions, we noted seven enzymes related to DNA metabolism: a B-type DNA polymerase (only 27.2% identical to that of Pandoravirus dulcis), a divergent primase (24% identical), two helicases, one exonuclease, one methylase, and one recombinase. Other predicted functions included four transcription components (RNA polymerase subunits RPB1, RPB2, and RPB10, and transcription elongation factor S-II), two serine/threonine protein kinases, one protein phosphatase, and a homolog to the cap-binding translation initiation factor 4E. The predicted virion packaging ATPase of Mollivirus is also remotely similar to that of Pandoraviruses (31.5% identical residues). The 50 Mollivirus proteins most similar to Acanthamoeba castellanii homologs are also quite divergent (47.7 ± 4.8% identity; median, 45.7%) and exhibit a large proportion of anonymous proteins (38/50 = 76%). Among the 12 proteins with a predicted function we noted an mRNA capping enzyme (35.3% identical), a dCMP deaminase (74.8% identical), a haem peroxidase (ml_395; 59.3% identical), a putative autophagy protein (49% identical), as well as two endonucleases. Curiously, the sole Mollivirus protein (ml_347) remotely similar to a major capsid protein was closest (62.6% identical) to a homolog encoded by A. castellanii (18). Table S2 lists the 110 Mollivirus predicted proteins (110/523 = 21%) that could be associated to a putative function. The dominant categories include proteins containing nonspecific protein–protein interaction motifs (e.g., Ankyrin repeats) followed by DNA-processing enzymes (e.g., DNA polymerase, primase, nuclease, helicase, etc.) as well as nucleotide biosynthesis such as a deoxycytidylate deaminase, a deoxyuridine 5′-triphosphate nucleotidohydrolase, a guanylate kinase, and two nucleotide diphosphate kinase homologs. However, several key DNA biosynthesis enzymes such as thymidylate synthase and thymidylate kinase usually found in large DNA viruses are not encoded by Mollivirus. Even more remarkably, Mollivirus appears to be the sole large DNA virus without its own ribonucleotide reductase, a key enzyme required for the synthesis of all deoxyribonucleotides.
Proteome of Purified Mollivirus Particles.
The particles of the previously described giant viruses (14, 16, 20, 21) are all associated with a large number of proteins. The Mollivirus sibericum virion is no exception, with up to 230 proteins, each of them reliably detected by the identification of at least two different peptides using tandem mass spectrometry (SI Methods). Most of them (187) were detected in at least two out of three independent virion preparations (biological replicates) (Table S3A). Out of the 230 virion proteins, 136 (59%) are from Mollivirus and 94 (41%) from Acanthamoeba, a proportion twice as large as that found in Pithovirus sibericum virions (37/196= 18.9%) (16) or Pandoravirus salinus (56/266 = 21%) (14) using proteomic approaches of similar sensitivity. Among the Mollivirus-encoded proteins detected in the virions, 74 (54.4%) are ORFans and only 35 (25.7%) could be associated to functional or domain-only predictions (Table S3A), a proportion similar to that in the whole viral gene content. The seven components of virus-encoded transcription apparatus are conspicuously absent in the virion proteome (Table S2) confirming that the early stage of Mollivirus replication requires nuclear functions, as already suggested by the rapid migration of the Mollivirus genome into the nucleus (Fig. 3) and the morphological changes undergone by the host nucleus during the infectious cycle (Fig. 2A). The three most abundant virion proteins correspond to ORFans, followed by the ml_347 gene product, homolog to the major capsid protein found in all large icosahedral DNA viruses, although Mollivirus particles do not exhibit such symmetry. This protein was not among the seven proteins found to lie at the virion surface as probed by limited trypsin proteolysis of intact purified particles (highlighted in blue, Table S3A). Only 3 of the 22 Ankyrin-repeat containing proteins identified in the Mollivirus genome are part of the particle proteome, indicating that most of them are not structural proteins but might participate in intracellular interactions. The same is true for the three BTB/POZ domain-containing proteins. In contrast, six of the eight predicted oxidoreductases are detected in the particle (Table S2), most likely to counteract the oxidative stress encountered in the Acanthamoeba phagosome (21). A YjgF-like domain, putative translation inhibitor homolog (ml_79, ranking 11th) and a lipocalin (ml_287, ranking 12th) are among the proteins of functional interest in the particle, together with two enzymes that might participate to the glycosylation of the Mollivirus virion proteins (GlcNAc transferase ml_336, glycosyltransferase ml_353) labeled by standard in-gel glycoprotein detection kits.
In contrast to the Mollivirus-encoded proteins, a putative function could be predicted for 84 (89.4%) of the 94 Acanthamoeba gene products detected in the Mollivirus virions (Table S3A). The first host-derived protein ranks 55th in the particle proteome abundance list, a difference further reflected by the comparison of the whole abundance index distributions of the 136 Mollivirus proteins vs. the 94 Acanthamoeba proteins (65 of which were detected in 2/3 replicates) (Table S3A). The two distributions are significantly different (Kolmogorov–Smirnov, P < 0.001) with respective average values of 6.45 ± 2.36 × 10−3 vs. 1.07 ± 0.16 × 10−3. Assessing to which level of abundance low ranking Acanthamoeba gene products retain a functional significance will require further experimental studies. Pending these validations, three categories of functions appear to dominate the Acanthamoeba-derived moiety of the Mollivirus particle. The first one—and most unexpected—are 11 ribosomal proteins both from the small (S4, S7, S8, S9, S15, S23) and large subunits (L5, L6, L18, L30, L35) detected in two or all of our biological replicates (Table S3B). A total of 23 different ribosomal proteins (together with a ribosomal RNA assembly protein and the ribosome anti-association factor IF6) are detected to various extents of reproducibility, most likely due to a combination of their small size and low abundance (Table S3B). The second largest category is constituted of other mRNA binding/processing enzymes such as five helicases including a homolog of the cap-binding translation initiation factor 4A, and three other proteins involved in nuclear RNA processing and transport. Adding three histone homologs, HMG-like chromatin-associated proteins, and a homolog of the nuclear Es2 proteins, the Mollivirus particles incorporate a total of 13 different proteins normally confined to the host nucleus. The third most important category consists of Acanthamoeba gene products with similarity to actin, acting-binding or acting cross-linking proteins (profilins, actophorin, fascin, talin, etc.). This is a total of 11 proteins that might participate in the transport of the virion content to the nucleus through the reorganization of the host cytoskeleton (22).
We further investigated the location of the host-contributed proteins in the virion by comparing the peptides identified after limited proteolysis of intact particles to the fully digested particles. Our results (Table S3A) suggest that the detected Acanthamoeba proteins are most likely not simply associated to the particle surface and are thus presumably incorporated within the virions. These host-derived proteins are thus in a position to be involved in the early stage of the next infectious process.
Host–Virus Proteome Dynamics Throughout a Full Replication Cycle.
For convenience, mRNA abundances measured by deep sequencing are widely used as proxies for protein abundances, even though weak correlations have often been demonstrated between the two measures (23). Indeed, what happens within a cell at a given time is a direct consequence of protein abundances, not of the levels of their cognate transcripts. In this study, we directly examined the variation of expressed functions and proteins occurring in A. castellanii infected by Mollivirus by performing a series of proteome analyses at regular intervals throughout the whole virus replication cycle. First, the quantified abundances of proteins were analyzed globally, mixing Mollivirus-encoded, mitochondrion-encoded, and host’s nucleus-encoded proteins (Fig. S3).
As expected, a small peak of viral proteins was detected at 30 min PI, corresponding to the most abundant proteins of the Mollivirus virions detectable in the host after internalization. The relative abundance of neosynthesized viral proteins then increased steadily over time, as to represent about 16% of the total protein content of the virus–host system 6 h PI when the first virions are synthesized, and 23% 9 h PI. Symmetrically, the relative abundance of the host-encoded protein linearly decreases over time, a pattern consistent with the release of neoformed particles through exocytosis (i.e., preserving the host cell integrity) rather than through cell lysis. Interestingly, the relative abundance of mitochondrion-encoded proteins followed a parallel pattern, suggesting that these ATP-producing organelles are neither activated nor specifically degraded during the Mollivirus infectious cycle.
Given the above finding that no component of the Mollivirus-encoded transcription apparatus was detected in the virion proteome, we expected that its replication cycle should exhibit two markedly different phases during which viral genes are initially transcribed in the nucleus by the host apparatus before being taken over by the virus-encoded apparatus. Such a shift is well illustrated in Fig. 5, showing the respective abundances of the DNA-dependent RNA polymerase main subunits (RPB1, RPB2) of the host and virus, during the first 6 h PI. Before 4 h PI, the Mollivirus-encoded RPB1 and RPB2 proteins remain undetected, and then appear and maintain the same abundance throughout the rest of the replication cycle. This indicates that all viral proteins newly produced before 3–4 h PI are the products of genes transcribed by the host transcription machinery, presumably within the intact amoeba nucleus. Once established, the abundances of the Mollivirus RPB1 and RPB2 proteins remain mostly unchanged during the rest of the replication cycle, whereas those of their cellular counterparts, initially present at the same level, begin to decrease after 4 h simultaneously to the morphological change exhibited by the nucleus (Fig. 2A). Noticeably, the viral DNA polymerase is readily detected 3 h PI, whereas the level of its cellular homolog remains very low (Fig. 5).
This coarse analysis of Mollivirus infectious cycle was refined by analyzing the abundance of all of the virus-encoded proteins at various times PI (from 30 min to 6 h). A clustering algorithm (SI Methods) was used to help visualize discriminating patterns in the resulting heat map (Fig. S4) built with all of the proteins reliably detected at two or more time points.
The largest set of proteins corresponds to those exhibiting a peak of production between 3 and 4 h PI. They correspond to “intermediate” genes presumably transcribed by the neosynthesized virus-encoded machinery (although the cellular one may still be functional). Proteins belonging to this group include the virus mRNA capping enzyme (ml_416) and its cap-binding translation initiation factor 4E (ml_363) amid a large number of proteins of unknown functions. Before this phase, a group of “early” proteins involved in the replication of DNA (e.g., DNA polymerase: ml_318; various helicases: ml_266, ml_359, ml_385) becomes detectable 1 h before (i.e., 3 h PI). Consistently, the few Mollivirus-encoded enzymes predicted to participate in the synthesis (or salvage) of nucleotides becomes detectable at the same time, or 1 h before (e.g., nucleotide diphosphate kinase: ml_233; dihydropholate reductase: ml_37; deoxyuridine 5′-triphosphate nucleotidohydrolase: ml_29; guanylate kinase: ml_103). A fourth well-delineated pattern (Fig. S4, Bottom) characterizes a group of 48 proteins brought in by the infecting virus particles. Their abundance decreases steadily, as they are presumably degraded in the host cell phagosome, until 4 h PI. Their increase signs the synthesis of new viral particles and the end of the replication cycle. The 11 proteins detected 1 h PI, but not found in the particle proteome, must be transcribed before that time. This can only be achieved if the Mollivirus genome can quickly reach the nucleus in the 30 min following its translocation from the phagosome to the cytoplasm (Fig. 3). Only three of these early proteins have a predicted function (ml_25 is a nuclease, ml_29 is a dUTPase, and ml_114 is a serine/threonine protein kinase).
We then investigated the influence of Mollivirus infection on the expression of Acanthamoeba proteins. We first noticed that, among 2,474 different Acanthamoeba proteins reliably detected in uninfected cells and at least two time points PI, 2,406 exhibited less than a twofold change in their relative abundance at any point in time during the first 6 h PI (Fig. 6). Thirty of them appear “up-regulated” and 38 “down-regulated.” This first result indicates that Mollivirus infection preserves the global integrity of the host cell even after the apparent disruption of its nucleus. This is consistent with our observation (Fig. 2) that the infected cells can support the production of Mollivirus virions for several hours, shedding virions in the surrounding medium without drastic impairment of their viability. Focusing on proteins exhibiting more than a twofold change in abundance (Fig. 6 and Table S4), the most dramatic (about 16× fold) increase corresponds to the largest subunit of the diphosphate ribonucleotide reductase, an enzyme absolutely required for DNA synthesis, conspicuously absent from the viral genome. The abundance profile of this cellular enzyme steadily increases from 2 to 5 h PI (Fig. 6), in phase with that of the viral DNA polymerase (Fig. 5). We also noticed that three of the host proteins exhibiting more than a twofold increase in abundance ended up associated with the Mollivirus particle: an autophagy protein that might be involved in intracellular membrane reorganization along with an H2A core histone paralog and a high mobility group box domain-containing protein that might be involved in DNA packaging. On the other end, the list of proteins exhibiting a significant decrease in abundance contains a variety of enzymes without clear functional relationship. The most important decreases concern a haem peroxidase and a monoamine oxidase corresponding to two adjacent genes (Table S4). Interestingly, the virus encoded haem peroxidase (ml_395) starts accumulating when the host enzyme reaches a minimal level (3 h PI). Intriguingly, the Acanthamoeba proteins associated to the Mollivirus particles exhibit a variety of abundance profiles, many of which are not progressively increasing over time.
Direct Detection of Pithovirus and Mollivirus DNA in the Original Permafrost Sample.
Mollivirus sibericum and the previously described Pithovirus sibericum (16) have been isolated from the same 30,000-y-old permafrost sample using a similar Acanthamoeba cocultivation protocol. To demonstrate their presence and measure their relative abundance, we sequenced DNA directly extracted from the original permafrost sample, in search for cognate sequences. Out of 368,474,026 100-nt pair-ended reads generated on an Illumina platform, 336 and 125 could be mapped (>92.5% identity) on the Mollivirus and Pithovirus genome sequence, respectively (Table 1). As a control for an eventual cross-contamination, we looked for the presence of reads matching the genome of the modern giant viruses routinely cultivated in the laboratory. A total of only seven (most likely spurious) reads were found to match: Mimivirus (four reads, <92% identical), Pandoravirus salinus (one read, 96% identical), Pandoravirus dulcis (one read, 85% identical), and Megavirus chilensis (one read, 66% identical). Although the mapped reads only covered 4.8% and 2% of the Mollivirus and Pithovirus genomes, respectively, their distributions are quite uniform (Fig. 7). Interestingly a coverage of 3.6% was found for the 46.7 million-bp haploid genome of A. castellanii (27,894 mapped reads), indicating the presence at a very low—but similar—abundance level of the viruses host in the permafrost sample. More precisely, these respective coverages values correspond to a ratio of 1.1 Pithovirus and 2.7 Mollivirus virions per A. castellanii cell in the original sample.
Table 1.
Reference | Genome length, bp | No. mapped reads | Total mapped read length, bp | Genome coverage, % |
A. castellanii | 46,714,639 | 27,894 | 1,693,714 | 3.6 |
Mollivirus sibericum | 651,523 | 336 | 31,081 | 4.8 |
Pithovirus sibericum | 610,033 | 125 | 12,123 | 2 |
Mimivirus | 1,181,549 | 4 | 369 | 0 |
Megavirus chilensis | 1,259,197 | 1 | 66 | 0 |
Pandoravirus salinus | 1,908,524 | 1 | 96 | 0 |
Pandoravirus dulcis | 2,473,870 | 1 | 85 | 0 |
SI Methods
Sample Recovery and Radiocarbon Dating.
Sterility controls of the samples were performed during their collection as previously described (16, 32, 33). The samples of buried soils were taken from the frozen outcrop walls in Chukotka, on the Stanchikovsky yar (GPS coordinates: 68.370155, 161.415553, 68°22′13″N and 161°24′56E), 23- to 24-m height from the Anui River. The melting material was first cleaned out from the wall surface, and the unthawing layer was exposed; the frozen rock was excavated to make a hollow of 30–40 cm deep, and a sample was taken from this hollow. After treating with 95% ethanol, the sample was placed in a sterile plastic bag and stored frozen. In the laboratory, the samples were stored in freezers at −20 °C.
Virus Isolation and Production.
Mollivirus was isolated from a piece of the buried soil sample P1084-T as previously reported (16). Briefly, 400 mg of P1084-T were resuspended in 6 mL of Prescott and James medium (27). The infection trials were performed twice and produced identical results. Each 3 mL were supplemented with 300 µL of Amphotericin B (Fungizone), 250 μg/mL, and 1.65 mL of this 10% Fungizone solution was left overnight at 4 °C under stirring. After decantation, the supernatant was recovered and centrifuged at 800 × g for 5 min. Acanthamoeba castellanii (Douglas) Neff (ATCC 30010TM) cells adapted to resist Fungizone (2.5 μg/mL) were inoculated with 100 µL of the supernatant and with the pellet resuspended in 50 µL of Tris (20 mM), CaCl2 (1 mM), pH 7.4. The cells were cultured at 32 °C in microplates with 1 mL of protease peptone–yeast extract–glucose (PPYG) medium supplemented with antibiotics [ampicillin, 100 μg/mL, and penicillin–streptomycin, 100 μg/mL (Gibco); Fungizone, 2.5 µg/mL (Life Technologies)] and monitored for cell death.
Virus Purification.
The wells presenting an infection phenotype were recovered, centrifuged for 5 min at 500 × g to remove the cellular debris and used to infect four T-75 tissue-culture flasks plated with fresh Acanthamoeba cells. After completion of the infectious cycle, the cultures were recovered, centrifuged for 5 min at 500 × g to remove the cellular debris, and the virus was pelleted by a 30-min centrifugation at 3,000 × g prior purification. The viral pellet was then resuspended and washed twice in PBS and layered on a discontinuous sucrose gradient [30%/40%/50%/60% (wt/vol)], and centrifuged at 5,000 × g for 15 min. The virus produced a white disk, which was recovered and washed twice in PBS and stored at 4 °C or −80 °C with 7.5% DMSO.
Virus Cloning.
A. castellanii cells (70,000/cm2) were seeded on a 12-well culture plate with 1 mL of PPYG. After adhesion, viruses were added to the well at a multiplicity of infection (MOI) of 50. After 1 h, the well was washed several times with 1 mL of PPYG to remove the excess of viruses. The cells were then recovered by gently scrapping the well, and a serial dilution was performed in the next three wells by mixing 200 µL of the previous well with 500 µL of PPYG. Drops of 0.5 µL of the last dilution were recovered and observed by light microscopy to verify that there were less than two cells. The 0.5-µL droplets were then distributed in each well of a 24-well culture plate. Thousand uninfected A. castellanii cells in 500 µL of PPYG were added to the wells seeded with a single cell and monitored for cell death. The corresponding viral clones were recovered and amplified prior purification, DNA extraction, proteome analysis, and cell cycle characterization by electron microscopy.
Infected Cells and Virion Imaging.
Infectious cycle observations by TEM.
A. castellanii-infected cell cultures were fixed by adding an equal volume of PBS with 2% glutaraldehyde and incubated for 20 min at room temperature. Cells were recovered and pelleted for 20 min at 5,000 × g. The pellet was resuspended in 1 mL of PBS with 1% glutaraldehyde, incubated at least 1 h at 4 °C, and washed twice in PBS prior coating in agarose and embedding in Epon resin. Each pellet was mixed with 2% low-melting agarose and centrifuged to obtain small flanges of approximatively 1-mm3 containing the sample coated with agarose. These samples were then embedded in Epon resin using a standard method: 1-h fixation in 1% osmium tetroxide, dehydration in increasing ethanol concentrations (50%, 70% including uranyl acetate 2%, 90%, and 100% ethanol), and embedding in Epon-812. Ultrathin sections of 70 nm were poststained with 4% uranyl acetate and lead citrate and observed using a Zeiss EM 912 operating at 100 kV.
Scanning electron microscopy observations of Mollivirus purified particles.
A suspension of purified Mollivirus particles in PHEM buffer (240 mM Pipes, 100 mM Hepes, 8 mM MgCl2, 40 mM EGTA, pH 6.9) was adsorbed on a poly-l-lysine–coated silica slide for 10 min at room temperature, and then fixed with 2.5% glutaraldehyde in PHEM buffer for 20 min. After three washes of 5 min in PHEM buffer and two of 2 min in ddH2O, the silica slide was put in 50% acetone for 5 min. Serial increasing acetone baths (75%, 85%, 95%, 100%) of 5 min were done, followed by two more baths in 100% acetone for 5 min. Samples were then placed in the chamber of a critical point dryer filled with 100% acetone. After cooling to 10 °C, the acetone was replaced by carbon dioxide before heating at the critical point under pressure. Samples were sputter coated with 80 Å of gold and observed on a Jeol JSM-6320F at 15 kV.
Visualization of 5-ethynyl-2′-deoxyuridine-labeled Mollivirus particles.
A. castellanii cells were infected by Mollivirus at a MOI of 0.25 and grown in the presence of 100 µM 5-ethynyl-2′-deoxyuridine (EdU) until the infectious cycle was complete. The virions were recovered and used to infect A. castellanii cells at a MOI of 20. Cells were recovered after 30, 60, and 90 min of infection and fixed with formaldehyde (3.7%), permeabilized with Triton X-100 (0.5%), and labeled with Alexa Fluor 488 picolyl azide in a copper buffer according to the manufacturer protocol (Click-it Plus EdU 488 Imaging Kit; Molecular Probes). Images were recorded on a Zeiss Axio Observer Z1 inverted microscope using a 63× objective lens associated with a 1.6× Optovar.
Mollivirus Virion DNA Extraction.
The genomic DNA was recovered from 1.8 × 1010 purified particles using the PureLink Genomic DNA Extraction Mini Kit (Life Technologies) according to the manufacturer’s protocol.
Mollivirus Genome Sequencing.
Five hundred nanograms of genomic DNA were sheared to a 150- to 700-bp range using the Covaris E210 instrument (Covaris). Sheared DNA was used for Illumina library preparation by a semiautomatized protocol. Briefly, end repair, A-tailing, and ligation of Illumina compatible adaptors (Bioo Scientific) were performed using the SPRIWorks Library Preparation System and SPRI TE instrument (Beckman Coulter), according to the manufacturer’s protocol. A 300- to 600-bp size selection was applied to recover most of the fragments. DNA fragments were amplified by 12 cycles of PCR using Platinum Pfx Taq Polymerase Kit (Life Technologies) and Illumina adapter-specific primers. Libraries were purified with 0.8× AMPure XP beads (Beckman Coulter). After library profile analysis by Agilent 2100 Bioanalyzer (Agilent Technologies) and qPCR quantification, the libraries were sequenced using 151 base-length read chemistry in paired-end flow cell on the Illumina MiSeq (Illumina). About 2 × 1.5 million useful reads were obtained.
Transcriptome Preparation.
Mollivirus-infected A. castellanii cells.
Adherent cells were infected by Mollivirus with a MOI of 50 and distributed in 30 flasks (1.4 × 107 cells/flasks of 175 cm2) containing 20 mL of PPYG and left at 32 °C for 30 min, after which viruses in excess were removed. For each time point, 12 mL were recovered to make three pools (1: 30 min, 1, 2 h; 2: 3, 4, 5 h; 3: 6, 7, 9 h) for transcriptomic analysis, 3 mL for the quantitative temporal proteomic study and 3 mL for TEM observations.
RNA extraction.
RNA was extracted using the RNeasy Midi kit (catalog no. 75144; Qiagen) using the manufacturer’s protocol. Briefly, the cells were resuspended in 4 mL of RLT buffer supplemented with 0.1% β-mercaptoethanol and disrupted by subsequent −80 °C freezing and thawing at 37 °C for 10 min. Total RNA was eluted with two successive additions of ∼200 µL of RNase-free water.
RNA quantification and quality control.
RNA was quantified by measuring the absorbance at 260 nm using a NanoDrop Spectrophotometer. The integrity of the RNA sample was assessed using the Experion Automated Electrophoresis System with RNA StdSens chips and reagents (Bio-Rad).
Mollivirus Transcriptome Sequencing.
Paired-end libraries were prepared with early, intermediate, and late mRNA following Illumina’s protocol (TruSeq Stranded mRNA Sample Prep Kit). Briefly, 1 µg of mRNA was poly-A selected [Life Technologies; Dynabeads Oligo (dT)25], chemically fragmented, and converted into single-stranded cDNA using random hexamer priming. The second strand was then generated to create double-stranded cDNA. cDNA were 3′-adenylated, and Illumina adapters were added. DNA fragments (with adapters) were PCR-amplified using Illumina adapter-specific primers. Libraries were purified and then quantified using a Qubit Fluorometer (Life Technologies). Libraries profiles were evaluated using an Agilent 2100 bioanalyzer (Agilent Technologies). Each library was sequenced using 101 base-length read chemistry in a paired-end flow cell on the Illumina HiSeq2000 (Illumina). More than 61 million useful reads were obtained for each library.
Metagenome Sequencing and Data Analysis.
DNA extraction.
DNA was extracted from 0.52 and 0.242 g of the 1084-T permafrost sample using the PowerSoil DNA isolation kit (Mo Bio) following the manufacturer’s protocol except that we added 83 mM DTT to the second sample to permit a more effective lysis of the viral particles. We recovered respectively 744 ng and 1.12 µg of pure DNA (Qubit).
Metagenomic library preparation.
One hundred nanograms of genomic DNA were sonicated to a 100- to 1,000-bp size range using the E210 Covaris instrument (Covaris). Fragments were end-repaired, and then 3′-adenylated, and NextFlex PCR free DNA barcodes (Bioo Scientific Corporation) were added by using NEBNext Sample Reagent Set (New England Biolabs). Ligation products were purified with AMPure XP beads (Beckman Coulter), and DNA fragments (>200 bp) were amplified by 12 cycles of PCR using Platinum Pfx Taq Polymerase Kit (Life Technologies) and Illumina adapter-specific primers. Libraries were purified with 0.6× AMPure XP beads (Beckman Coulter). After library profile analysis by Agilent 2100 Bioanalyzer (Agilent Technologies) and qPCR quantification (MxPro; Agilent Technologies), the libraries were sequenced using 101 base-length read chemistry in paired-end flow cell on the Illumina HiSeq2000 (Illumina).
Metagenomic data analysis.
Reads containing sequences of low complexity were filtered out to avoid spurious matches using the build-in “dust” software tool from the SGA assembler (34) with the following parameters: minimal length = 50 and dust threshold = 2. This resulted in a total of 368,474,026 usable reads. These reads were mapped to the Mollivirus genome as well as to the genomes of the following other giant viruses: Pandoravirus salinus (Refseq ID: NC_022098.1), Pandoravirus dulcis (Refseq ID: NC_021858.1), Pithovirus sibericum (Refseq ID: NC_023423.1), Mimivirus (Refseq ID: NC_014649.1), Megavirus chilensis (Refseq ID: NC_016072.1), and the cellular host A. castellanii (GenBank Assembly ID: GCA_000313135.1) using bowtie2 with the “very sensitive” parameter (35). The genomes from Mollivirus and Pithovirus exhibited small but significant numbers of permafrost metagenomics mapped reads (Table 1). Cumulative distributions of these mapped reads against their positions in their respective target genomes were plotted (Fig. 7).
Mollivirus Genome Assembly and Annotation.
Mollivirus genome assembly.
Mollivirus genome was assembled using SOAPdenovo (36) with a stringent k-mer parameter value (k = 121) using 2 × 1,544,523 paired-end reads and 2,221 single-end reads of 150 nt. This resulted in a single contig that was corrected by mapping the reads using bowtie2 (35) and taking the consensus sequence using Gap5 (37). Read coverage (about 600×) was uniform along the contig except for one region of 10 kb located at one extremity and exhibiting twice as much coverage (about 1,200×) characteristic of repeated sequences. This prompted us to verify the connection between the 10-kb region and the contig by PCR. The results confirmed a 651,523-bp linear genome flanked by a 10-kb-long inverted repeat at each extremity.
Gene annotation.
Homology searches were performed using BlastP against the nonredundant (NR) GenBank database (19) with an E-value threshold < 10−5. The functional annotation of Mollivirus predicted proteins was complemented by CD search (38), the FUGUE program (39), and RPS-BLAST (40) against COGs (41) (E-value threshold < 10−3).
Protein Extraction.
Virion proteome.
A total of 108 purified particles was resuspended in 100 µL of lysis buffer (Tris⋅HCl, 40 mM; SDS, 2%; and DTT, 60 mM, pH 7.5) before extraction in gel loading buffer (100 mM Tris⋅HCl, pH 6.8; SDS, 2%; glycerol, 4%; β-mercaptoethanol, 5%; and traces of bromophenol blue) and 10 min of heating at 95 °C.
Infected cells protein extraction.
For each time point of the infection experiment performed for the transcriptomic study, 3 mL of the culture were centrifuged to recover the cells, and the pellet was frozen and stored at −80 °C prior analysis. Proteins extracted from infected cells at each time point were solubilized in Laemmli gel loading buffer prior digestion.
Protein Electrophoresis.
Ten and 15 µg of extracted proteins from Mollivirus solubilized in Laemmli buffer were separated on a 4–12% gradient polyacrylamide gel (NuPAGE; Invitrogen) before staining using colloidal Coomassie blue (GelCode Blue Stain Reagent; Pierce, Thermo Scientific) and periodic acid-Schiff method (Glycoprotein detection kit; Sigma-Aldrich), respectively. Five micrograms of horseradish peroxidase were used as positive control for glycoprotein detection.
Proteomic Analysis.
Protein digestion.
Virion and infected cells proteomes.
Proteins were stacked in the top of a 4–12% NuPAGE gel (Invitrogen) before R-250 Coomassie blue staining. The gel band was manually excised and cut in pieces before being washed by six successive incubations of 15 min in 25 mM NH4HCO3 and in 25 mM NH4HCO3 containing 50% (vol/vol) acetonitrile. Gel pieces were then dehydrated with 100% acetonitrile and incubated for 45 min at 53 °C with 10 mM DTT in 25 mM NH4HCO3 and for 35 min in the dark with 55 mM iodoacetamide in 25 mM NH4HCO3. Alkylation was stopped by adding 10 mM DTT in 25 mM NH4HCO3 and mixing for 10 min. Gel pieces were then washed again by incubation in 25 mM NH4HCO3 before dehydration with 100% acetonitrile. Modified trypsin (Promega; sequencing grade) in 25 mM NH4HCO3 was added to the dehydrated gel pieces for an overnight incubation at 37 °C. Peptides were then extracted from gel pieces in three 15-min sequential extraction steps in 30 µL of 50% acetonitrile, 30 µL of 5% formic acid, and finally 30µL of 100% acetonitrile. The pooled supernatants were then dried under vacuum.
Surfome (surface proteome) analysis.
The 8 × 108 purified particles were incubated for 30 min at 37 °C in digestion buffer (50 mM Tris⋅HCl, pH 7.5, 150 mM NaCl, and 5 mM CaCl2) containing or not 1.8 µg of modified trypsin (Promega; sequencing grade). Samples were then centrifuged for 3 min at 14,500 × g, and the supernatant was centrifuged again the same way. The supernatant was then submitted to an overnight trypsin digestion (1 µg of modified trypsin) at 37 °C. Peptides were then desalted using C18 spin columns (Harvard Apparatus).
Nano-LC-MS/MS analyses.
The dried extracted peptides were resuspended in 5% acetonitrile and 0.1% trifluoroacetic acid and analyzed by online nano–LC-MS/MS (Ultimate 3000, Dionex and LTQ-Orbitrap Velos pro, or Q-Exactive Plus; Thermo Fisher Scientific). Peptides were sampled on a 300 µm × 5-mm PepMap C18 precolumn and separated on a 75 µm × 250-mm C18 column (PepMap, Dionex). The nano-LC method consisted in a 120-min (for particle and surfome analyses) or a 240-min (for infected cells analyses) gradient at a flow rate of 300 nL/min, ranging from 5% to 37% acetonitrile in 0.1% formic acid during 114 min before reaching 72% acetonitrile in 0.1% formic acid for the last 6 min. MS and MS/MS data were acquired using Xcalibur (Thermo Fisher Scientific). Spray voltage and heated capillary were set at 1.4 kV and 200 °C, respectively. Survey full-scan MS spectra (m/z = 400–1,600) were acquired in the Orbitrap with a resolution of 60,000 after accumulation of 106 ions (maximum filling time, 500 ms). The 20 most intense ions from the preview survey scan delivered by the Orbitrap were fragmented by collision-induced dissociation (collision energy, 35%) in the LTQ after accumulation of 104 ions (maximum filling time, 100 ms).
Mass spectrometry bioinformatics data analyses.
For particle and surfome analyses, data were processed automatically using Mascot Daemon software (version 2.5.1; Matrix Science). Concomitant searches against Mollivirus protein sequence databank (523 entries), in-house–built A. castellanii protein sequence databank (16,206 entries), classical contaminants database (67,126 sequences, homemade), and the corresponding reversed databases were performed using Mascot (version 2.5). ESI-TRAP was chosen as the instrument, trypsin/P as the enzyme, and two missed cleavage allowed. Precursor and fragment mass error tolerances were set at 10 ppm and 0.6 Da, respectively, for data acquired on the LTQ-Orbitrap Velos, and 10 ppm and 25 mmu (milli-mass units of mDa) for data acquired on the Q-Exactive Plus. Peptide modifications allowed during the search were as follows: carbamidomethyl (C, fixes), acetyl (N-ter, variable), and oxidation (M, variable), and deamidation (NQ, variable). The IRMa software (28) (version 1.31.1) was used to filter the results: conservation of rank 1 peptides, peptide identification false discovery rate < 1% (as calculated on peptide scores by using the reverse database strategy), and minimum of one specific peptide per identified protein group.
For infected cells analyses, RAW files were processed using MaxQuant software (29) (version 1.5.1.2). Spectra were searched against the Mollivirus protein sequence databank (523 entries), A. castellanii protein sequence databank (16,206 entries), and the frequently observed contaminants database embedded in MaxQuant. Trypsin was chosen as the enzyme and two missed cleavages were allowed. Precursor mass error tolerances were set at 20 ppm and 4.5 ppm for first and main searches, respectively. Fragment mass error tolerance was set to 0.5 Da. Peptide modifications allowed during the search were as follows: carbamidomethylation (C, fixed), acetyl (protein Nter, variable), and oxidation (M, variable). Minimum peptide length was set to 6 aa. Minimum number of peptides, razor plus unique peptides, and unique peptides were all set to 1. Maximum false discovery rates were set to 0.01 at peptide and protein levels. Label-free quantification (LFQ) and intensity-based absolute quantification (iBAQ) values were calculated from MS intensities of unique peptides.
Each A. castellanii protein was quantified as the ratio of its abundance relative to the total amoeba proteins quantity at each time point. For the Mollivirus proteome analysis, we used all identified proteins (host and virus) at each time point to normalize the data. Mollivirus protein clustering analysis was performed using hierarchical clustering with Euclidian distances (Fig. S4).
Gene Content-Based Clustering.
A cladistics tree (Fig. 8) was constructed based on the clustering of gene contents of the following completely sequenced viral genomes: Mollivirus, Acanthamoeba polyphaga mimivirus (NC_014649), Acanthamoeba polyphaga moumouvirus (NC_020104), Aedes taeniorhynchus iridescent virus (NC_008187), African swine fever virus (NC_001659), Amsacta moorei entomopoxvirus (NC_002520), Autographa californica nucleopolyhedrovirus (NC_001623), Bathycoccus RCC1105 virus (NC_014765), Cafeteria roenbergensis virus (NC_014637), Culex nigripalpus NPV (NC_003084), Ectocarpus siliculosus virus1 (NC_002687), Emiliania huxleyi virus86 (NC_007346), Feldmannia species virus (NC_011183), Human herpesvirus 1 (NC_001806), Human herpesvirus 6A (NC_001664), Infectious spleen and kidney necrosis virus (NC_003494), Lausannevirus (NC_015326), Lymphocystis disease virus china (NC_005902), Mamestra configurata NPV-A (NC_003529), Marseillevirus (NC_013756), Megavirus chilensis (NC_016072), Megavirus lba (NC_020232), Melanoplus sanguinipes entomopoxvirus (NC_001993), Micromonas RCC1109 MpV1 (NC_014767), Myxoma virus (NC_001132), Neodiprion abietis NPV (NC_008252), Orf virus (NC_005336), Ostreococcus lucimarinus virus OlV1 (NC_014766), Ostreococcus tauri virus1 (NC_013288), Ostreococcus virus OsV5 (NC_010191), Pandoravirus dulcis (NC_021858), Pandoravirus inopinatum (NC_026440), Pandoravirus salinus (NC_022098), Paramecium bursaria Chlorella virus1 (NC_000852), Phaeocystis globosa virus (NC_021312), Pithovirus sibericum (NC_023423), Rodent herpesvirus Peru (NC_015049), Spodoptera litura granulovirus (NC_009503), Wiseana iridescent virus (NC_015780). and Aureococcus anophagefferens virus (NC_024697). We first performed gene clustering using OrthoMCL (42) with standard parameters (Blast E-value cutoff = 10−5 and mcl inflation factor = 1.5) on the protein coding genes of length ≥ 100 aa. This resulted in the definition of 3,001 distinct clusters. We computed a presence/absence matrix based on the genes clusters and calculated a distance matrix using the distance defined in ref. 43. Finally the phylogenetic tree was constructed using neighbor joining. Support values were estimated using bootstrap resampling (n = 10,000).
A. castellanii Genome Annotation.
Gene structure prediction.
Although not annotated, the genome sequence assembly of A. castellanii from the Baylor College of Medicine (GenBank Assembly Name: “Acas_2”; ID: GCA_000193105.1) with a total length of 46,714,639 bp, is substantially larger than the one from Clarke et al. (18) (GenBank Assembly Name: “Acastellanii.strNEFF v1”; ID: GCA_000313135.1) with a length of 42,019,824 bp. This prompted us to perform gene prediction on the Acas_2 assembly. We used the Maker annotation software (44) with Augustus (45), Genemark-ES (46), and Snap (47) ab initio gene prediction algorithms, supported by UniprotKB (Swiss-prot) protein sequences as well as A. castellanii EST expression data (ftp://ftp.hgsc.bcm.edu/AcastellaniNeff/ESTs/BRENDANSSFFS/). After manual curation of the predicted gene structures, we were able to predict 16,206 protein-coding genes, which contain 8.25 exons of 214 nt separated by 105-nt-long introns, on average.
We next questioned whether this new dataset actually represented a significant improvement over the Clarke et al. annotation (18) predicting 14,974 protein-coding genes. We thus mapped all available RNA-seq transcriptomic data from Clarke et al. (SRA IDs: SRX203182, SRX203266–SRX203275, SRX208998) (18) to the predicted transcripts from both annotations using bowtie2 (35). The proportion of RNA-seq reads mapped to our annotation of A. castellanii transcripts (76.75%) is much higher than the 54.33% of mapped reads to Acastellanii.strNEFF v1 annotated transcripts, and thus an increase of 41%. Symmetrically to this true-positives measurement, we quantified the proportion of RNA-seq reads aligned in predicted untranscribed regions, i.e., intergenic and intronic regions. These percentages were 11.51% vs. 8% for the Acastellanii.strNEFF v1 and the Acas_2 assemblies, respectively, and thus a decrease of 30% in the number of false negatives in our annotation. The gain regarding predicted transcribed regions for protein-coding genes could correspond to a more accurate definition of either coding regions or UTRs. To test whether our annotation actually improved the prediction of A. castellanii protein sequences, we mapped proteomic data from MS/MS experiment of A. castellanii cells using Mascot (48). Again, we found an improvement of 21% in protein sequence prediction using our annotation (19,620 spectral counts vs. 16,226).
Prediction of protein functions.
Functional annotation of predicted genes was performed using Blast2GO (49) with Blastp against the NR database and Interproscan motifs detection. As expected, the vast majority of predicted proteins are identical or highly similar to the A. castellanii proteins predicted by Clarke et al. (18), with 87% of the predicted genes having a best BLAST hit corresponding to this annotation. Genes with a significant BLAST hit (E value < 10−5) were assigned the best-match functional annotation. For genes with no significant BLAST hit or no annotation (“hypothetical protein”), we assigned Gene Ontology annotations predicted from Blast2GO whenever available. Eventually, we were able to annotate 77% of the predicted Acanthamoeba genes.
Discussion
Using Acanthamoeba castellanii as bait, we isolated a new type of giant DNA virus from the same sample of 30,000-y-old permafrost from which we recently characterized Pithovirus sibericum (16). Although this virus, named Mollivirus sibericum, again exhibits a nonicosahedral ovoid particle, its nucleus-dependent mode of replication, genome organization, and gene content definitely indicate that it does not belong to the Pithovirus (proposed) family nor to the Iridovirus and Marseillevirus families with which Pithovirus sibericum exhibit a weak phylogenetic affinity (Fig. 8). Instead, and quite unexpectedly given their differences in morphologies and virion and genome sizes, Mollivirus sibericum phylogenetically clusters as a distant relative of the giant Pandoraviruses (14, 15). However, it is not yet clear if this phylogenetic position is due to a truly ancestral relationship of Mollivirus sibericum with the Pandoraviruses, or to the insertion of 83 Pandoravirus-derived genes into an otherwise unrelated Mollivirus genomic framework, unusually prone to horizontal gene transfers as also suggested by the presence of 50 putatively Acanthamoeba-derived genes (Fig. 4). The latter hypothesis is consistent with the lack of detectable colinearity between Mollivirus and the Pandoraviruses genomes (Fig. S5). We also noticed that, out of the 136 viral proteins in Mollivirus virions, only 28 have a homolog in Pandoravirus particles. Moreover, these pairs of homologous proteins exhibit highly discrepant abundances in their respective virion proteomes and the sets of the 20 most abundant virion proteins in Mollivirus and P. salinus (representing more than 60% of the total protein abundance) do not overlap. This suggests that the basic structures of the two particles are very different despite their common ovoid shape. The virion proteins shared by Mollivirus and Pandoravirus might thus be involved in host-specific interactions rather than bona fide structural features. The isolation and characterization of additional independent Mollivirus sibericum relatives will permit to assess the diversity of this new candidate family and the robustness of its phylogenetic relationship with the Pandoraviruses. It is worthwhile to notice that, to the best of our knowledge, no previous sighting of a Mollivirus-like endoparasite have been reported in the past literature, in contrast with Pithovirus (16, 24). Until Mollivirus sibericum relatives are isolated in contemporary environments, we cannot rule out that the permafrost was the only reservoir left for this viral family.
Our characterization of Mollivirus sibericum extensively relied on detailed proteomic analyses of both the particle and its replication cycle. The proteome of the particle revealed two main features: the absence of an embarked transcription apparatus and the unusual presence of many ribosomal (and ribosome-related) proteins. The first feature is consistent with the early migration of the viral genome to the nucleus (Fig. 3), the host nucleus morphological changes, the perinuclear location of the virion factory (Fig. 2), and the presence of introns in some viral genes (Table S1), all suggesting that the early replication stages of Mollivirus requires nuclear functions.
The large number of ribosomal proteins detected within Mollivirus particles is most unusual, and to our knowledge unique among giant (14, 16, 20, 21) or more conventional DNA viruses for which proteomic studies are available (25). The sole other documented association of virus particles and ribosomes acquired from their host concerns the Arenaviruses, a single-stranded RNA virus family, that do not have any phylogenetic relationship with Mollivirus. In contrast with Arenavirus, TEM images of Mollivirus virion do not exhibit the grainy appearance of ribosomal particles. In addition, our attempts to reveal the presence of ribosomal RNA in purified Mollivirus particles have remained unsuccessful. This suggests that the ribosomal (and ribosome-related) proteins we detected are not from intact ribosomes, but may originate from the nucleolus in the vicinity of which part of the virion assembly takes place. The fact that ribosomal proteins were never identified in other viruses such as Pandoraviruses or Chlorella viruses installing their virion factory in the host nucleus is puzzling and raises the question whether these ribosomal proteins are mere bystanders or play a role in the Mollivirus infectious process.
We then explored the course of an entire infectious cycle using quantitative proteomics to examine the nature and dynamics of virus–host interactions. At the most global level, the relative proportions of Mollivirus-, mitochondrion-, and Acanthamoeba-encoded proteins were found to vary rather smoothly, consistently with an infectious pattern preserving the cellular host integrity as long as possible and with the release of neoformed particles through exocytosis. The relative abundance of mitochondrion-encoded proteins followed a pattern parallel to that of other proteins, suggesting that these ATP-producing organelles are neither activated nor specifically degraded, during the first 6 h PI. Only 30 and 38 host proteins (1.45% and 1.53%, respectively, of 2,474 total protein seen at least in two time points) were found to be significantly up-regulated or down-regulated. None of them are homologous to innate immunity response related proteins seen to be up-regulated in mammalian cells undergoing a cytomegalovirus infection (26). An unexpected finding of the proteomic time course was that the synthesis of the virus-encoded DNA polymerase appeared to precede that of the viral transcription machinery, also shown to coexist with that of the host (Fig. 5). This suggests that the transcription of Mollivirus intermediate and late genes (i.e., 3 h PI) might proceed from both the original and neosynthesized DNA molecules and be simultaneously performed by the viral and host-derived transcription apparatus.
Mollivirus sibericum is now the third type of nonicosahedral giant virus discovered in less than 3 y using A. castellanii as a model host (14–16). This suggests that this morphotype is probably not rare and predicts that many more are to be found, including some that might have been misidentified as uncultivable bacteria in the context of human and animal diseases. These three types of nonicosahedral viruses have several key characteristics in common, despite exhibiting no or little phylogenetic affinities:
-
i)
Their large, empty-looking virions enclose large dsDNA genomes not compacted in electron-dense nucleoids. With its 650-kb genome occupying a very large virion core of about 86 106 nm3, Mollivirus exhibits a low DNA packing density (0.0075 bp/nm3), a characteristic feature of the previously described Pandoraviruses and Pithovirus. The DNA-packaging mechanisms used by these giant viruses remain to be elucidated, as well as the molecular 3D structure of the viral genome inside the particle.
-
ii)
All three types of virus particles are lined by an internal lipid membrane and use the same mechanism to infect their Acanthamoeba host: phagocytosis followed by the opening of a delivery portal, fusion of the internal virion membrane with the phagosome membrane, and delivery of the particle content in the cytoplasm.
-
iii)
All of these giant viruses exhibit large proportions (>2/3) of encoded proteins without homologs, even between each other, raising the question on the origin of the corresponding genes, or of the mechanisms by which such diverse gene repertoire could be generated. With the exception of Pithovirus, the genomes of Mollivirus and Pandoraviruses do not exhibit mobile elements or repeated structures known to promote genomic instability.
On the other hand, these three types of giant nonicosahedral viruses exhibit marked differences: (i) G+C-rich (Mollivirus, Pandoraviruses) or A+T-rich (Pithovirus) genomes; (ii) linear (Mollivirus, Pandoraviruses) or circular (Pithovirus) genomes; (iii) high variability in genome sizes (600 kb for Mollivirus and Pithovirus, up to 2.8 Mb for Pandoraviruses); and (iv) their replication mode is either nucleocytoplasmic (Mollivirus, Pandoraviruses) or entirely cytoplasmic (Pithovirus).
Such different features in giant viruses infecting the same Acanthamoeba host support our previous suggestion that Pandoravirus-like particles might be associated to a diversity of viruses as large as that associated with icosahedral capsids in terms of evolutionary origins, genome size, or molecular nature (DNA or RNA) (14, 16).
Finally, our finding that two different viruses infecting the same host could be revived from a single permafrost sample, definitely suggests that prehistory “live” viruses are not a rare occurrence. Furthermore, the roughly equal representation of the two viruses in the metagenomics data suggests that there is no difference in the survival capacity of particles of either cytoplasmic (Pithovirus) or nucleus-dependent viruses (Mollivirus). Such modes of replication also correspond to the Poxvirus and Herpesvirus families, respectively. Although no read sequences were close enough to detect known Poxvirus and Herpesvirus isolates in the metagenome of our permafrost sample, we cannot rule out that distant viruses of ancient Siberian human (or animal) populations could reemerge as arctic permafrost layers melt and/or are disrupted by industrial activities.
Methods
Mollivirus sibericum Isolation and Production.
Mollivirus was isolated from a piece of the buried soil sample P1084-T as previously reported (16). Four hundred milligrams of P1084-T were resuspended in 6 mL of Prescott and James medium (27), and then used for infection trials of A. castellanii (Douglas) Neff (ATCC 30010TM) cells adapted to resist Fungizone. Cultures presenting an infected phenotype were recovered, centrifuged for 5 min at 500 × g to remove the cellular debris, and used to infect T-75 tissue culture flasks plated with fresh Acanthamoeba cells. After a succession of passages, viral particles produced in sufficient quantity were recovered and purified. See SI Methods for details.
Genome Sequencing, Assembly, and Annotation.
Five hundred nanograms of purified genomic DNA were sheared to a 150- to 700-bp range using the Covaris E210 instrument (Covaris) prior Illumina library preparation using a semiautomatized protocol. Briefly, end repair, A-tailing, and ligation of Illumina compatible adaptors (Bioo Scientific) were performed using the SPRIWorks Library Preparation System and SPRI TE instrument (Beckman Coulter), according to the manufacturer’s protocol. A 300- to 600-bp size selection was applied to recover most of the fragments. DNA fragments were amplified by 12 cycles of PCR using Platinum Pfx Taq Polymerase Kit (Life Technologies) and Illumina adapter-specific primers. Libraries were purified with 0.8× AMPure XP beads (Beckman Coulter). After library profile analysis by Agilent 2100 Bioanalyzer (Agilent Technologies) and quantitative PCR quantification, the libraries were sequenced using 151 base-length read chemistry in paired-end flow cell on the Illumina MiSeq (Illumina). About 3 M useful pair-end reads were obtained. SI Methods contains further details on the bioinformatic genome assembly and annotation procedures.
Metagenomic Study.
DNA was extracted from 0.52 and 0.242 g of the 1084T permafrost sample using the PowerSoil DNA isolation kit (Mo Bio) following the manufacturer’s protocol except that we added 83 mM DTT to the second sample to permit a more effective lysis of the viral particles. We recovered respectively 744 ng and 1.12 µg of pure DNA (Qubit). SI Methods contains further details on the library preparation and bioinformatic analysis of the data.
Transcriptomic and Proteomic Samples Preparation.
Adherent cells were infected by Mollivirus at a MOI of 50 and distributed in 30 flasks (1.4 × 107 cells/flasks of 175 cm2) containing 20 mL of protease peptone–yeast extract–glucose and left at 32 °C for 30 min, before removing excess viruses. For each time point, 12 mL were recovered to make three pools (1: 30 min, 1, 2 h; 2: 3, 4, 5 h; 3: 6, 7, 9 h) for transcriptomic analysis, 3 mL for the quantitative temporal proteomic study and 3 mL for inclusions and TEM observations. Each mRNA pool was sequenced on the Illumina MiSeq platform leading to 61 million, 71 million, and 71 million paired-ended 100-nt reads, respectively, of which 96–97.8% could be mapped on the Mollivirus or A. castellanii (18) genome sequences. For proteomic study of the infection, all time points were analyzed independently. SI Methods contains further details on sample preparations.
Proteome Analyses.
For particle proteome and infectious cycle analyses, proteins were extracted in gel loading buffer and heated for 10 min at 95 °C. Proteins were stacked in the top of a 4–12% (wt/vol) polyacrylamide gel and in-gel digested before nano–liquid chromatography (LC)-MS/MS analyses of resulting peptides. For surfome analyses, purified virions were incubated for 30 min in digestion buffer (50 mM Tris⋅HCl, pH 7.5, 150 mM NaCl, and 5 mM CaCl2) with or without trypsin (control). After centrifugation, supernatants were digested overnight with trypsin and resulting peptides analyzed by nano–LC-MS/MS. Peptides and proteins were identified using Mascot (Matrix Science) and IRMa (28) (version 1.31.1) for particle and surface proteomes and identified and quantified using MaxQuant (29) for infectious cycle analysis. Detailed procedures are presented in SI Methods.
Supplementary Material
Acknowledgments
We thank Dr. J.-P. Chauvin, Dr. A. Kosta, F. Richard, and A. Aouane and Serge Nitsche for their expert assistance on the imaging platforms, Dr. Dorothée Murat for providing some transmission electron microscopy (TEM) images, and Miguel Ortiz Lombardia for his thorough reading of the manuscript. This work was partially supported by France Génomique Grant ANR-10-INSB-01-01, French National Research Agency Grant ANR-14-CE14-0023-01, the Provence-Alpes-Côte-d’Azur Région (2010 12125), ProFi Grant ANR-10-INBS-08-01, and Russian Scientific Fund 14-14-01115.
Footnotes
The authors declare no conflict of interest.
This article is a PNAS Direct Submission.
Data deposition: The Mollivirus genome sequence reported in this paper has been deposited in the GenBank database (accession no. KR921745). The transcriptomic data have been deposited in the Sequence Read Archive, www.ncbi.nlm.nih.gov/Traces/sra/ [accession no. SRX1078581 (SRR2084123 for the early class of expression, SRR2103267 for the intermediate one, and SRR2103268 for the late class of expression)]. The mass spectrometry proteomics data have been deposited to the ProteomeXchange Consortium, proteomecentral.proteomexchange.org, via the Proteomics Identifications partner repository [dataset identifiers <PXD002375> (Particule and Surfome) and <PXD002374> (Time Course)]. All data can be visualized on an interactive genome browser at the following link: www.igs.cnrs-mrs.fr/cgi-bin/gb2/gbrowse/Mollivirus/.
This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1510795112/-/DCSupplemental.
References
- 1.La Scola B, et al. A giant virus in amoebae. Science. 2003;299(5615):2033. doi: 10.1126/science.1081867. [DOI] [PubMed] [Google Scholar]
- 2.Raoult D, et al. The 1.2-megabase genome sequence of Mimivirus. Science. 2004;306(5700):1344–1350. doi: 10.1126/science.1101485. [DOI] [PubMed] [Google Scholar]
- 3.Arslan D, Legendre M, Seltzer V, Abergel C, Claverie J-M. Distant Mimivirus relative with a larger genome highlights the fundamental features of Megaviridae. Proc Natl Acad Sci USA. 2011;108(42):17486–17491. doi: 10.1073/pnas.1110889108. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Claverie J-M, Abergel C. Mimivirus and its virophage. Annu Rev Genet. 2009;43:49–66. doi: 10.1146/annurev-genet-102108-134255. [DOI] [PubMed] [Google Scholar]
- 5.Mutsafi Y, Zauberman N, Sabanay I, Minsky A. Vaccinia-like cytoplasmic replication of the giant Mimivirus. Proc Natl Acad Sci USA. 2010;107(13):5978–5982. doi: 10.1073/pnas.0912737107. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Yoosuf N, et al. Related giant viruses in distant locations and different habitats: Acanthamoeba polyphaga moumouvirus represents a third lineage of the Mimiviridae that is close to the Megavirus lineage. Genome Biol Evol. 2012;4(12):1324–1330. doi: 10.1093/gbe/evs109. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Fischer MG, Allen MJ, Wilson WH, Suttle CA. Giant virus with a remarkable complement of genes infects marine zooplankton. Proc Natl Acad Sci USA. 2010;107(45):19508–19513. doi: 10.1073/pnas.1007615107. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Santini S, et al. Genome of Phaeocystis globosa virus PgV-16T highlights the common ancestry of the largest known DNA viruses infecting eukaryotes. Proc Natl Acad Sci USA. 2013;110(26):10800–10805. doi: 10.1073/pnas.1303251110. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Moniruzzaman M, et al. Genome of brown tide virus (AaV), the little giant of the Megaviridae, elucidates NCLDV genome expansion and host-virus coevolution. Virology. 2014;466-467:60–70. doi: 10.1016/j.virol.2014.06.031. [DOI] [PubMed] [Google Scholar]
- 10.Boyer M, et al. Giant Marseillevirus highlights the role of amoebae as a melting pot in emergence of chimeric microorganisms. Proc Natl Acad Sci USA. 2009;106(51):21848–21853. doi: 10.1073/pnas.0911354106. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Thomas V, et al. Lausannevirus, a giant amoebal virus encoding histone doublets. Environ Microbiol. 2011;13(6):1454–1466. doi: 10.1111/j.1462-2920.2011.02446.x. [DOI] [PubMed] [Google Scholar]
- 12.Colson P, et al. “Marseilleviridae,” a new family of giant viruses infecting amoebae. Arch Virol. 2013;158(4):915–920. doi: 10.1007/s00705-012-1537-y. [DOI] [PubMed] [Google Scholar]
- 13.Doutre G, Philippe N, Abergel C, Claverie J-M. Genome analysis of the first Marseilleviridae representative from Australia indicates that most of its genes contribute to virus fitness. J Virol. 2014;88(24):14340–14349. doi: 10.1128/JVI.02414-14. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Philippe N, et al. Pandoraviruses: Amoeba viruses with genomes up to 2.5 Mb reaching that of parasitic eukaryotes. Science. 2013;341(6143):281–286. doi: 10.1126/science.1239181. [DOI] [PubMed] [Google Scholar]
- 15.Antwerpen MH, et al. Whole-genome sequencing of a pandoravirus isolated from keratitis-inducing acanthamoeba. Genome Announc. 2015;3(2):e00136-15. doi: 10.1128/genomeA.00136-15. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Legendre M, et al. Thirty-thousand-year-old distant relative of giant icosahedral DNA viruses with a pandoravirus morphology. Proc Natl Acad Sci USA. 2014;111(11):4274–4279. doi: 10.1073/pnas.1320670111. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Besemer J, Lomsadze A, Borodovsky M. GeneMarkS: A self-training method for prediction of gene starts in microbial genomes. Implications for finding sequence motifs in regulatory regions. Nucleic Acids Res. 2001;29(12):2607–2618. doi: 10.1093/nar/29.12.2607. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Clarke M, et al. Genome of Acanthamoeba castellanii highlights extensive lateral gene transfer and early evolution of tyrosine kinase signaling. Genome Biol. 2013;14(2):R11. doi: 10.1186/gb-2013-14-2-r11. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19. NCBI Resource Coordinators (2015) Database resources of the National Center for Biotechnology Information. Nucleic Acids Res 43(Database issue):D6–D17. [DOI] [PMC free article] [PubMed]
- 20.Renesto P, et al. Mimivirus giant particles incorporate a large fraction of anonymous and unique gene products. J Virol. 2006;80(23):11678–11685. doi: 10.1128/JVI.00940-06. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Claverie JM, Abergel C, Ogata H. Mimivirus. Curr Top Microbiol Immunol. 2009;328:89–121. doi: 10.1007/978-3-540-68618-7_3. [DOI] [PubMed] [Google Scholar]
- 22.Greber UF, Fassati A. Nuclear import of viral DNA genomes. Traffic. 2003;4(3):136–143. doi: 10.1034/j.1600-0854.2003.00114.x. [DOI] [PubMed] [Google Scholar]
- 23.Wu G, Nie L, Zhang W. Integrative analyses of posttranscriptional regulation in the yeast Saccharomyces cerevisiae using transcriptomic and proteomic data. Curr Microbiol. 2008;57(1):18–22. doi: 10.1007/s00284-008-9145-5. [DOI] [PubMed] [Google Scholar]
- 24.Michel R, Schmid EN, Hoffmann R, Müller KD. Endoparasite KC5/2 encloses large areas of sol-like cytoplasm within Acanthamoebae. Normal behavior or aberration? Parasitol Res. 2003;91(4):265–266. doi: 10.1007/s00436-003-0944-0. [DOI] [PubMed] [Google Scholar]
- 25.Maxwell KL, Frappier L. Viral proteomics. Microbiol Mol Biol Rev. 2007;71(2):398–411. doi: 10.1128/MMBR.00042-06. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Weekes MP, et al. Quantitative temporal viromics: An approach to investigate host-pathogen interaction. Cell. 2014;157(6):1460–1472. doi: 10.1016/j.cell.2014.04.028. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Page FC. A New Key to Freshwater and Soil Gymnamoebae. Freshwater Biological Association; Ambleside, UK: 1988. [Google Scholar]
- 28.Dupierris V, Masselon C, Court M, Kieffer-Jaquinod S, Bruley C. A toolbox for validation of mass spectrometry peptides identification and generation of database: IRMa. Bioinformatics. 2009;25(15):1980–1981. doi: 10.1093/bioinformatics/btp301. [DOI] [PubMed] [Google Scholar]
- 29.Cox J, Mann M. MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantification. Nat Biotechnol. 2008;26(12):1367–1372. doi: 10.1038/nbt.1511. [DOI] [PubMed] [Google Scholar]
- 30.Cox J, et al. Accurate proteome-wide label-free quantification by delayed normalization and maximal peptide ratio extraction, termed MaxLFQ. Mol Cell Proteomics. 2014;13(9):2513–2526. doi: 10.1074/mcp.M113.031591. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Yang YH, et al. Normalization for cDNA microarray data: A robust composite method addressing single and multiple slide systematic variation. Nucleic Acids Res. 2002;30(4):e15. doi: 10.1093/nar/30.4.e15. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Shatilovich A, Shmakova L, Mylnikov A, Gilichinsky D. 2009. Ancient protozoa isolated from permafrost. Permafrost Soils. Soil Biology, ed Margesin R (Springer, Berlin), pp 97–115.
- 33.Shatilovich AV, Shmakova LA, Gubin SV, Gudkov AV, Gilichinskiĭ DA. Viable protozoa in late Pleistocene and Holocene permafrost sediments. Dokl Biol Sci. 2005;401:136–138. doi: 10.1007/s10630-005-0066-1. [DOI] [PubMed] [Google Scholar]
- 34.Simpson JT, Durbin R. Efficient de novo assembly of large genomes using compressed data structures. Genome Res. 2012;22(3):549–556. doi: 10.1101/gr.126953.111. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Langmead B, Salzberg SL. Fast gapped-read alignment with Bowtie 2. Nat Methods. 2012;9(4):357–359. doi: 10.1038/nmeth.1923. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Luo R, et al. SOAPdenovo2: An empirically improved memory-efficient short-read de novo assembler. Gigascience. 2012;1(1):18. doi: 10.1186/2047-217X-1-18. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Bonfield JK, Whitwham A. Gap5—editing the billion fragment sequence assembly. Bioinformatics. 2010;26(14):1699–1703. doi: 10.1093/bioinformatics/btq268. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Marchler-Bauer A, Bryant SH. CD-Search: Protein domain annotations on the fly. Nucleic Acids Res. 2004;32(Web Server issue):W327–W331. doi: 10.1093/nar/gkh454. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Shi J, Blundell TL, Mizuguchi K. FUGUE: Sequence-structure homology recognition using environment-specific substitution tables and structure-dependent gap penalties. J Mol Biol. 2001;310(1):243–257. doi: 10.1006/jmbi.2001.4762. [DOI] [PubMed] [Google Scholar]
- 40.Marchler-Bauer A, et al. CDD: A database of conserved domain alignments with links to domain three-dimensional structure. Nucleic Acids Res. 2002;30(1):281–283. doi: 10.1093/nar/30.1.281. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Tatusov RL, et al. The COG database: An updated version includes eukaryotes. BMC Bioinformatics. 2003;4:41. doi: 10.1186/1471-2105-4-41. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Li L, Stoeckert CJ, Jr, Roos DS. OrthoMCL: Identification of ortholog groups for eukaryotic genomes. Genome Res. 2003;13(9):2178–2189. doi: 10.1101/gr.1224503. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Snel B, Bork P, Huynen MA. Genome phylogeny based on gene content. Nat Genet. 1999;21(1):108–110. doi: 10.1038/5052. [DOI] [PubMed] [Google Scholar]
- 44.Cantarel BL, et al. MAKER: An easy-to-use annotation pipeline designed for emerging model organism genomes. Genome Res. 2008;18(1):188–196. doi: 10.1101/gr.6743907. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Stanke M, Tzvetkova A, Morgenstern B. AUGUSTUS at EGASP: Using EST, protein and genomic alignments for improved gene prediction in the human genome. Genome Biol. 2006;7(Suppl 1):S11.1–8. doi: 10.1186/gb-2006-7-s1-s11. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Lomsadze A, Ter-Hovhannisyan V, Chernoff YO, Borodovsky M. Gene identification in novel eukaryotic genomes by self-training algorithm. Nucleic Acids Res. 2005;33(20):6494–6506. doi: 10.1093/nar/gki937. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Korf I. Gene finding in novel genomes. BMC Bioinformatics. 2004;5(1):59. doi: 10.1186/1471-2105-5-59. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Perkins DN, Pappin DJ, Creasy DM, Cottrell JS. Probability-based protein identification by searching sequence databases using mass spectrometry data. Electrophoresis. 1999;20(18):3551–3567. doi: 10.1002/(SICI)1522-2683(19991201)20:18<3551::AID-ELPS3551>3.0.CO;2-2. [DOI] [PubMed] [Google Scholar]
- 49.Conesa A, et al. Blast2GO: A universal tool for annotation, visualization and analysis in functional genomics research. Bioinformatics. 2005;21(18):3674–3676. doi: 10.1093/bioinformatics/bti610. [DOI] [PubMed] [Google Scholar]
- 50.Krumsiek J, Arnold R, Rattei T. Gepard: A rapid and sensitive tool for creating dotplots on genome scale. Bioinformatics. 2007;23(8):1026–1028. doi: 10.1093/bioinformatics/btm039. [DOI] [PubMed] [Google Scholar]
- 51.Kim D, et al. TopHat2: Accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol. 2013;14(4):R36. doi: 10.1186/gb-2013-14-4-r36. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Arike L, et al. Comparison and applications of label-free absolute proteome quantification methods on Escherichia coli. J Proteomics. 2012;75(17):5437–5448. doi: 10.1016/j.jprot.2012.06.020. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.