Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 Dec 18:5:714.
doi: 10.3389/fmicb.2014.00714. eCollection 2014.

Assembly of viral genomes from metagenomes

Affiliations

Assembly of viral genomes from metagenomes

Saskia L Smits et al. Front Microbiol. .

Abstract

Viral infections remain a serious global health issue. Metagenomic approaches are increasingly used in the detection of novel viral pathogens but also to generate complete genomes of uncultivated viruses. In silico identification of complete viral genomes from sequence data would allow rapid phylogenetic characterization of these new viruses. Often, however, complete viral genomes are not recovered, but rather several distinct contigs derived from a single entity are, some of which have no sequence homology to any known proteins. De novo assembly of single viruses from a metagenome is challenging, not only because of the lack of a reference genome, but also because of intrapopulation variation and uneven or insufficient coverage. Here we explored different assembly algorithms, remote homology searches, genome-specific sequence motifs, k-mer frequency ranking, and coverage profile binning to detect and obtain viral target genomes from metagenomes. All methods were tested on 454-generated sequencing datasets containing three recently described RNA viruses with a relatively large genome which were divergent to previously known viruses from the viral families Rhabdoviridae and Coronaviridae. Depending on specific characteristics of the target virus and the metagenomic community, different assembly and in silico gap closure strategies were successful in obtaining near complete viral genomes.

Keywords: assembly; metagenome; pathogen; viral metagenomics; virome; virus; virus discovery.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Viral target genomes. Panels (A–C) contain information on read coverage and contigs matching the viral genomes of DRV (A), RFFRV (B), and PNV (C), produced by different assembly algorithms. Shown are only contigs larger than 1 kb. Green: Contigs assembled through Genovo as described in the methods. Black outlined: Contigs assembled through iterative assembly. Black solid: Seed contig. Red: Contigs assembled through CLC Genomics workbench assembler. Blue: Contigs assembled through Newbler assembler. Small black boxes at the bottom of the read coverage line mark stretches of low sequence complexity. “ORF” indicates the genome organization as described below. “Motif” shows the location of sequence motifs. Motifs are shown in detail in Figure S1. “BLAST” shows regions with sequence homology as determined by BLASTX. Colored boxes show sequence identity to the best BLAST hit as indicated on top. “HMM” indicates region with remote homology identified by PFAM profiles, if any. Ruler at the bottom indicates sequence lengths in kilobases. (A) DRV, Dolphin rhabdovirus; N, nucleoprotein; P, phosphoprotein; M, matrix protein; G, glycoprotein; L, large protein. (B) RFFRV, Red fox fecal rhabdovirus; N, nucleoprotein; P, phosphoprotein; M, matrix protein; G, glycoprotein; L, large protein; no abbrevation, alpha 1,2,3 protein. (C) PNV, Python nidovirus; PP1a, polyprotein 1a; PP1b, polyprotein1b; S, spike glycoprotein; no abbreviations, minor membrane protein, membrane protein, nucleocapsid protein, minor membrane protein 2, putative hemagglutinin-neuraminidase protein. Striped line at 5′ end indicates putative unresolved 5′ end.
Figure 2
Figure 2
Coverage profile binning. Histograms of coverage (reads per base) of each contig of (A) of cell culture supernatant containing Dolphin rhabdovirus, (B) red fox feces containing red fox fecal rhabdovirus and (C) python lung tissue containing python nidovirus. Gray: contigs mapping to the finished viral genome. Black: seed contig. The first bar in the last panel is truncated for visibility (47%). Shown are only contigs larger than 1 kb.
Figure 3
Figure 3
K-mer profiling. Dot plots showing ranked k-mer distance of each contig when compared to the k-mer profile of the seed contig of (A) Dolphin rhabdovirus (DRV), (B) red fox fecal rhabdovirus (RFFRV), and (C) python nidovirus (PNV) in relation to contig lengths. Open boxes indicate contigs that were retrospectively identified as originating from the target genomes. Shown are only contigs larger than 1 kb.

Similar articles

Cited by

References

    1. Albertsen M., Hugenholtz P., Skarshewski A., Nielsen K. L., Tyson G. W., Nielsen P. H. (2013). Genome sequences of rare, uncultured bacteria obtained by differential coverage binning of multiple metagenomes. Nat. Biotechnol. 31, 533–538. 10.1038/nbt.2579 - DOI - PubMed
    1. Allander T., Tammi M. T., Eriksson M., Bjerkner A., Tiveljung-Lindell A., Andersson B. (2005). Cloning of a human parvovirus by molecular screening of respiratory tract samples. Proc. Natl. Acad. Sci. U.S.A. 102, 12891–12896. 10.1073/pnas.0504666102 - DOI - PMC - PubMed
    1. Attoui H., Billoir F., Cantaloube J. F., Biagini P., De Micco P., De Lamballerie X. (2000). Strategies for the sequence determination of viral dsRNA genomes. J. Virol. Methods 89, 147–158. 10.1016/S0166-0934(00)00212-3 - DOI - PubMed
    1. Bailey T. L., Boden M., Buske F. A., Frith M., Grant C. E., Clementi L., et al. . (2009). MEME SUITE: tools for motif discovery and searching. Nucleic Acids Res. 37, W202–W208. 10.1093/nar/gkp335 - DOI - PMC - PubMed
    1. Bloch K. C., Glaser C. (2007). Diagnostic approaches for patients with suspected encephalitis. Curr. Infect. Dis. Rep. 9, 315–322. 10.1007/s11908-007-0049-5 - DOI - PubMed

LinkOut - more resources