Use of simulated data sets to evaluate the fidelity of metagenomic processing methods
- PMID: 17468765
- DOI: 10.1038/nmeth1043
Use of simulated data sets to evaluate the fidelity of metagenomic processing methods
Abstract
Metagenomics is a rapidly emerging field of research for studying microbial communities. To evaluate methods presently used to process metagenomic sequences, we constructed three simulated data sets of varying complexity by combining sequencing reads randomly selected from 113 isolate genomes. These data sets were designed to model real metagenomes in terms of complexity and phylogenetic composition. We assembled sampled reads using three commonly used genome assemblers (Phrap, Arachne and JAZZ), and predicted genes using two popular gene-finding pipelines (fgenesb and CRITICA/GLIMMER). The phylogenetic origins of the assembled contigs were predicted using one sequence similarity-based (blast hit distribution) and two sequence composition-based (PhyloPythia, oligonucleotide frequencies) binning methods. We explored the effects of the simulated community structure and method combinations on the fidelity of each processing step by comparison to the corresponding isolate genomes. The simulated data sets are available online to facilitate standardized benchmarking of tools for metagenomic analysis.
Comment in
-
Interpreting the unculturable majority.Nat Methods. 2007 Jun;4(6):479-80. doi: 10.1038/nmeth0607-479. Nat Methods. 2007. PMID: 17538628 No abstract available.
Similar articles
-
SOrt-ITEMS: Sequence orthology based approach for improved taxonomic estimation of metagenomic sequences.Bioinformatics. 2009 Jul 15;25(14):1722-30. doi: 10.1093/bioinformatics/btp317. Epub 2009 May 13. Bioinformatics. 2009. PMID: 19439565
-
Metagenomics: read length matters.Appl Environ Microbiol. 2008 Mar;74(5):1453-63. doi: 10.1128/AEM.02181-07. Epub 2008 Jan 11. Appl Environ Microbiol. 2008. PMID: 18192407 Free PMC article.
-
nWayComp: a genome-wide sequence comparison tool for multiple strains/species of phylogenetically related microorganisms.In Silico Biol. 2007;7(2):195-200. In Silico Biol. 2007. PMID: 17688445
-
Annotation, comparison and databases for hundreds of bacterial genomes.Res Microbiol. 2007 Dec;158(10):724-36. doi: 10.1016/j.resmic.2007.09.009. Epub 2007 Oct 6. Res Microbiol. 2007. PMID: 18031997 Review.
-
Get the most out of your metagenome: computational analysis of environmental sequence data.Curr Opin Microbiol. 2007 Oct;10(5):490-8. doi: 10.1016/j.mib.2007.09.001. Epub 2007 Oct 23. Curr Opin Microbiol. 2007. PMID: 17936679 Review.
Cited by
-
A biological treasure metagenome: pave a way for big science.Indian J Microbiol. 2008 Jun;48(2):163-72. doi: 10.1007/s12088-008-0030-5. Epub 2008 Jul 27. Indian J Microbiol. 2008. PMID: 23100711 Free PMC article.
-
INDUS - a composition-based approach for rapid and accurate taxonomic classification of metagenomic sequences.BMC Genomics. 2011 Nov 30;12 Suppl 3(Suppl 3):S4. doi: 10.1186/1471-2164-12-S3-S4. Epub 2011 Nov 30. BMC Genomics. 2011. PMID: 22369237 Free PMC article.
-
Assembly of viral genomes from metagenomes.Front Microbiol. 2014 Dec 18;5:714. doi: 10.3389/fmicb.2014.00714. eCollection 2014. Front Microbiol. 2014. PMID: 25566226 Free PMC article.
-
IMG/M: a data management and analysis system for metagenomes.Nucleic Acids Res. 2008 Jan;36(Database issue):D534-8. doi: 10.1093/nar/gkm869. Epub 2007 Oct 11. Nucleic Acids Res. 2008. PMID: 17932063 Free PMC article.
-
Metagenomic sequencing of an in vitro-simulated microbial community.PLoS One. 2010 Apr 16;5(4):e10209. doi: 10.1371/journal.pone.0010209. PLoS One. 2010. PMID: 20419134 Free PMC article.
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Other Literature Sources
Research Materials