Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 May;27(5):824-834.
doi: 10.1101/gr.213959.116. Epub 2017 Mar 15.

metaSPAdes: a new versatile metagenomic assembler

Affiliations

metaSPAdes: a new versatile metagenomic assembler

Sergey Nurk et al. Genome Res. 2017 May.

Abstract

While metagenomics has emerged as a technology of choice for analyzing bacterial populations, the assembly of metagenomic data remains challenging, thus stifling biological discoveries. Moreover, recent studies revealed that complex bacterial populations may be composed from dozens of related strains, thus further amplifying the challenge of metagenomic assembly. metaSPAdes addresses various challenges of metagenomic assembly by capitalizing on computational ideas that proved to be useful in assemblies of single cells and highly polymorphic diploid genomes. We benchmark metaSPAdes against other state-of-the-art metagenome assemblers and demonstrate that it results in high-quality assemblies across diverse data sets.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
The cumulative scaffold lengths plots. On the x-axis, scaffolds are ordered from the longest to the shortest. The y-axis shows the total length of x longest scaffolds in the assembly.
Figure 2.
Figure 2.
metaQUAST statistics for 20 most abundant species comprising the SYNTH data set. The NGA50 statistics (top left), the fraction of the reconstructed genome compared with the total genome length (top right), the number of intragenomic misassemblies (bottom left), and the number of intergenomic misassemblies (bottom right) for 20 most abundant species comprising the SYNTH data set. References are denoted by their RefSeq IDs (see Supplemental Table S2) and arranged in the decreasing order of the coverage depths.
Figure 3.
Figure 3.
The de Bruijn graphs of three strains and their strain mixture. The figure shows only a small subgraph of the de Bruijn graph. The abundant strain (strain1) is shown by thick lines, and the rare strains (strain2 and strain3) are shown by thin lines. The genomic repeat R is shown in red. (Top left) The de Bruijn graph of the abundant strain1. (Top right) The rare strain2 differs from the abundant strain1 by an insertion of an additional copy or repeat R. The two breakpoint edges resulting from this insertion are shown in green. These filigree edges are not removed by the graph simplification procedures in the standard assembly tools aimed at isolates. (Bottom left) The rare strain3 differs from the abundant strain1 by an insertion of a horizontally transferred gene (or a highly diverged genomic region). (Bottom right) The de Bruijn graph of the mixture of three strains.
Figure 4.
Figure 4.
Applying the metagenomics-specific decision rule for repeat resolution. The figure shows only a small subgraph of the assembly graph. (A) The path that is currently being extended (formed by green edges) along with its blue extension edges e and e′. (B) The short-edge traversal from the end of the extension edge e. The dotted curve shows the boundary frontier(e) of the traversal. The edges in the set next(e) are shown in red with low-coverage edges represented as dashed arrows (other edges in next(e) are represented as solid arrows). Since all edges in next(e) have low coverage, the edge e is ruled out as an unlikely extension candidate. (C) The short-edge traversal from the end of the extension edge e′. (D) Since e′ is a single extension edge that was not ruled out (there is a solid edge in next(e′)), it is added to the growing path and the extension process continues.
Figure 5.
Figure 5.
Repeat resolution in metagenomic assembly. (A) One of two identical copies of a long (longer than the insert size) repeat R (red) in the abundant strain has mutated into a unique genomic “green” region R′ in the rare strain. (B) The assembly graph resulting from a mixture of reads from the abundant and rare strains. Two alternative paths between the start and the end of the green edge (one formed by a single green edge and another formed by two black and one red edge) form a bulge. (C) The strain-contig spanning R′ (shown by green dashed line) constructed by exSPAnder at the “generating strain-contigs” step. (D) Masking of the strain variation at the “transforming assembly graph into consensus assembly graph” step leads to a projection of a bulge (formed by red and green edges) and results in the consensus assembly graph shown in E. The blue arrows emphasize that SPAdes projects rather than deletes bulges, facilitating the subsequent reconstruction of strain-paths in the consensus assembly graph. (E) Reconstruction of the strain-path (green dotted line), corresponding to a strain-contig (green dashed line) at the “generating strain-paths in the consensus assembly graph” step. (F) At the “repeat resolution using strain-paths” step, metaSPAdes utilizes both strain-paths and paired reads to resolve repeats in the consensus graph. The green dotted strain-path from E is used as additional information to reconstruct the consensus contig cRd spanning the long repeat.

Similar articles

Cited by

References

    1. Antipov D, Korobeynikov A, McLean JS, Pevzner PA. 2016. hybridSPAdes: an algorithm for hybrid assembly of short and long reads. Bioinformatics 32: 1009–1015. - PMC - PubMed
    1. Aparicio S, Chapman J, Stupka E, Putnam N, Chia J-M, Dehal P, Christoffels A, Rash S, Hoon S, Smit A, et al. 2002. Whole-genome shotgun assembly and analysis of the genome of Fugu rubripes. Science 297: 1301–1310. - PubMed
    1. Arumugam M, Raes J, Pelletier E, Le Paslier D, Yamada T, Mende DR, Fernandes GR, Tap J, Bruls T, Batto J, et al. 2011. Enterotypes of the human gut microbiome. Nature 473: 1–7. - PMC - PubMed
    1. Ashton PM, Nair S, Dallman T, Rubino S, Rabsch W, Mwaigwisya S, Wain J, O'Grady J. 2014. MinION nanopore sequencing identifies the position and structure of a bacterial antibiotic resistance island. Nat Biotechnol 33: 296–300. - PubMed
    1. Bankevich A, Pevzner PA. 2016. TruSPAdes: barcode assembly of TruSeq synthetic long reads. Nat Methods 13: 248–250. - PubMed

Publication types