Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2008 Oct 8;3(10):e3373.
doi: 10.1371/journal.pone.0003373.

MetaSim: a sequencing simulator for genomics and metagenomics

Affiliations

MetaSim: a sequencing simulator for genomics and metagenomics

Daniel C Richter et al. PLoS One. .

Abstract

Background: The new research field of metagenomics is providing exciting insights into various, previously unclassified ecological systems. Next-generation sequencing technologies are producing a rapid increase of environmental data in public databases. There is great need for specialized software solutions and statistical methods for dealing with complex metagenome data sets.

Methodology/principal findings: To facilitate the development and improvement of metagenomic tools and the planning of metagenomic projects, we introduce a sequencing simulator called MetaSim. Our software can be used to generate collections of synthetic reads that reflect the diverse taxonomical composition of typical metagenome data sets. Based on a database of given genomes, the program allows the user to design a metagenome by specifying the number of genomes present at different levels of the NCBI taxonomy, and then to collect reads from the metagenome using a simulation of a number of different sequencing technologies. A population sampler optionally produces evolved sequences based on source genomes and a given evolutionary tree.

Conclusions/significance: MetaSim allows the user to simulate individual read datasets that can be used as standardized test scenarios for planning sequencing projects or for benchmarking metagenomic software.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. Taxonomy Editor.
A clipping of the taxonomy editor view is shown. Three taxa are assigned an abundance value (number in parenthesis). These settings can be either determined in a text-based abundance profile file or directly in the taxonomy editor by right-clicking on a node.
Figure 2
Figure 2. Fragment Recruitment Plot.
Black dots represent 10,000 sequencing reads (Sanger technology, ≈800 bp) drawn from 100 evolved offsprings (α = 0.004) of the source genome Escherichia coli K-12 substr. MG1655. Their sequence identity is lower compared to the mapped reads sampled directly form the source genome (red dots).
Figure 3
Figure 3. Frequency distribution of clone lengths.
As an example, 250,000 clones with mean length 1000 bp and standard deviation of 100 bp were modelled with a normal distribution.
Figure 4
Figure 4. The graphical user interface of MetaSim is divided into three panels: a project tree on the left containing all simulation settings and taxon profiles, an overview and edit panel on the right and a message panel at the bottom.
Additionally, a configuration window is shown.
Figure 5
Figure 5. Assignment curves of reads taxonomically classified by MEGAN.
The precentage values refer to the number of sampled reads generated for each organism. (A) The simLC dataset consists of only two organisms. The number of assigned reads to M. marisnigri JR1 almost equals the number of its sampled reads whereas E. coli str. K-12 substr. MG1655 has only few assignmentss. (B) In the simMC dataset, the number of assigned reads increases significantly with longer read lengths (except for Shigella dysenteriae Sd197, Francisella tularensis subsp. tularensis Schu 4 and E. coli str. K12 substr. MG1655). (C) In the simHC dataset, the fraction of assigned reads to Campylobacter jejuni subsp. jejuni 81-176, Lactococcus lactis subsp. cremoris SK11 and Pseudomonas aeruginosa PA7 is rather low compared to the other organisms.
Figure 6
Figure 6. MEGAN visualization of the simLC data set (Sanger technology, read length ≈800 bp).
Two arrows point out the two source genomes of the simulation run. The number of assigned reads to E.coli K12 (192) is rather small compared to the number of sampled reads from the genome of E. coli str. K12 substr. MG1655 (192 assigned vs. 3214 sampled reads). Many reads have BLAST hits in multiple strains and clades, so that MEGAN assigns them to an high-order level in the tree e.g. node Bacteria (3157 reads). M. marisnigri JR1 has only few related strains. In this case, the assignment of reads is more specific (15,366 assigned vs. 15,509 sampled reads).
Figure 7
Figure 7. MEGAN visualization of the simHC data set (Sanger technology, read length ≈800 bp).
Arrows point out three of the 11 source genomes of the simulation run that show only few assigned reads at species level compared to the number of originally sampled reads. Due to the fact that C. jejuni subsp. jejuni 81-176, L. lactis subsp. cremoris SK11 and P. aeruginosa PA7 share genes with many closely related strains, most of the sampled reads were assigned by MEGAN to an high-order level in the tree (e.g. genus).

Similar articles

Cited by

References

    1. Rusch DB, Halpern AL, Sutton G, Heidelberg KB, Williamson S, et al. The Sorcerer II Global Ocean Sampling expedition: northwest Atlantic through eastern tropical Pacific. PLoS Biol. 2007;5:e77. - PMC - PubMed
    1. Tringe SG, von Mering C, Kobayashi A, Salamov AA, Chen K, et al. Comparative Metagenomics of Microbial Communities. Science. 2005;308:554–557. - PubMed
    1. Tyson GW, Chapman J, Hugenholtz P, Allen EE, Ram RJ, et al. Community structure and metabolism through reconstruction of microbial genomes from the environment. Nature. 2004;428:37–43. - PubMed
    1. Gill SR, Pop M, Deboy RT, Eckburg PB, Turnbaugh PJ, et al. Metagenomic analysis of the human distal gut microbiome. Science. 2006;312:1355–1359. - PMC - PubMed
    1. Turnbaugh PJ, Ley RE, Mahowald MA, Magrini V, Mardis ER, et al. An obesity-associated gut microbiome with increased capacity for energy harvest. Nature. 2006;444:1027–1031. - PubMed