Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2018 Mar 20;6(1):50.
doi: 10.1186/s40168-018-0437-0.

Species classifier choice is a key consideration when analysing low-complexity food microbiome data

Affiliations
Comparative Study

Species classifier choice is a key consideration when analysing low-complexity food microbiome data

Aaron M Walsh et al. Microbiome. .

Abstract

Background: The use of shotgun metagenomics to analyse low-complexity microbial communities in foods has the potential to be of considerable fundamental and applied value. However, there is currently no consensus with respect to choice of species classification tool, platform, or sequencing depth. Here, we benchmarked the performances of three high-throughput short-read sequencing platforms, the Illumina MiSeq, NextSeq 500, and Ion Proton, for shotgun metagenomics of food microbiota. Briefly, we sequenced six kefir DNA samples and a mock community DNA sample, the latter constructed by evenly mixing genomic DNA from 13 food-related bacterial species. A variety of bioinformatic tools were used to analyse the data generated, and the effects of sequencing depth on these analyses were tested by randomly subsampling reads.

Results: Compositional analysis results were consistent between the platforms at divergent sequencing depths. However, we observed pronounced differences in the predictions from species classification tools. Indeed, PERMANOVA indicated that there was no significant differences between the compositional results generated by the different sequencers (p = 0.693, R2 = 0.011), but there was a significant difference between the results predicted by the species classifiers (p = 0.01, R2 = 0.127). The relative abundances predicted by the classifiers, apart from MetaPhlAn2, were apparently biased by reference genome sizes. Additionally, we observed varying false-positive rates among the classifiers. MetaPhlAn2 had the lowest false-positive rate, whereas SLIMM had the greatest false-positive rate. Strain-level analysis results were also similar across platforms. Each platform correctly identified the strains present in the mock community, but accuracy was improved slightly with greater sequencing depth. Notably, PanPhlAn detected the dominant strains in each kefir sample above 500,000 reads per sample. Again, the outputs from functional profiling analysis using SUPER-FOCUS were generally accordant between the platforms at different sequencing depths. Finally, and expectedly, metagenome assembly completeness was significantly lower on the MiSeq than either on the NextSeq (p = 0.03) or the Proton (p = 0.011), and it improved with increased sequencing depth.

Conclusions: Our results demonstrate a remarkable similarity in the results generated by the three sequencing platforms at different sequencing depths, and, in fact, the choice of bioinformatics methodology had a more evident impact on results than the choice of sequencer did.

Keywords: Low-complexity microbiome; Sequencing platform comparison; Shotgun metagenomics.

PubMed Disclaimer

Conflict of interest statement

Ethics approval and consent to participate

Not applicable

Consent for publication

Not applicable

Competing interests

The authors declare that they have no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Figures

Fig. 1
Fig. 1
Compositional analysis of the mock community using the total number of reads from each sequencer. a Species-level profile of the mock community, as determined by each species classifier. b Correlations between the relative abundances of species with their respective genome sizes
Fig. 2
Fig. 2
Compositional analysis of kefir samples using the total number of reads from each sequencer. a Species-level profile of the kefir samples, as determined by each species classifier. b Dissimilarity plot showing differences between sequencers. c Dissimilarity plot showing differences between species classifiers
Fig. 3
Fig. 3
Strain-level analysis, with PanPhlAn, using the total number of reads from each sequencer. a The highest match for each of 11 mock community species for which ≥ 2 reference strain genomes are available at RefSeq, based on the presence/absence of pangenome gene families. b A comparison of the relatedness of the Lactobacillus kefiranofaciens and Leuconostoc mesenteroides strains detected in kefir samples with each of the reference strain genomes present in the respective c Statistical differences in the proportion of PanPhlAn pangenome gene families detected using each sequencer
Fig. 4
Fig. 4
Functional analysis, with SUPER-FOCUS, using the total number of sequences from each sequencer. a The relative abundances of SUPER-FOCUS level 1 subsystems detected in the mock community. b Dissimilarity plot based on the relative abundances of the SUPER-FOCUS level 3 subsystems detected in the kefir samples. c SUPER-FOCUS level 2 subsystems which were significantly altered between sequencers
Fig. 5
Fig. 5
The effect of sequencing depth on compositional and functional analysis of the mock community. a The species-level profile of the mock community sample at different sequencing depths on each sequencer. b The relative abundances of the top 5 most prevalent SUPER-FOCUS level 1 subsystems detected in the mock community at different sequencing depths on each sequencer
Fig. 6
Fig. 6
The effect of sequencing depth on compositional and functional analysis of kefir. a The average species-level profile of kefir samples at different sequencing depths on each sequencer. b Species whose abundances were most highly impacted by sequencing depth (0.05 < p < 0.1). c Dissimilarity plot based on the relative abundances of the SUPER-FOCUS level 3 subsystems detected in the kefir samples at different sequencing depths on each sequencer
Fig. 7
Fig. 7
The effect of sequencing depth on metagenome assembly using IDBA-UD. a The n50 numbers at each sequencing depth. b Statistical differences in the n50 number at 100,000, 1,000,000, and 7,500,000 reads per sample
Fig. 8
Fig. 8
The effect of sequencing depth on PanPhlAn analysis of the two most abundant kefir species, Lactobacillus kefiranofaciens and Leuconostoc mesenteroides. a The predicted percentage similarity of kefir strains relative to their most closely related reference strain, at each sequencing depth. Grey cells indicate that the species was not classified to the strain level at the specified depth. b Statistical differences in the percentage similarity at 100,000, 1,000,000, and 7,500,000 reads per sample

Similar articles

Cited by

References

    1. Consortium HMP. Structure, function and diversity of the healthy human microbiome. Nature. 2012;486(7402):207. doi: 10.1038/nature11234. - DOI - PMC - PubMed
    1. Fierer N, Leff JW, Adams BJ, Nielsen UN, Bates ST, Lauber CL, Owens S, Gilbert JA, Wall DH, Caporaso JG. Cross-biome metagenomic analyses of soil microbial communities and their functional attributes. Proc Natl Acad Sci. 2012;109(52):21390–21395. doi: 10.1073/pnas.1215210110. - DOI - PMC - PubMed
    1. Lauro FM, McDougald D, Thomas T, Williams TJ, Egan S, Rice S, DeMaere MZ, Ting L, Ertan H, Johnson J. The genomic basis of trophic strategy in marine bacteria. Proc Natl Acad Sci. 2009;106(37):15527–15533. doi: 10.1073/pnas.0903507106. - DOI - PMC - PubMed
    1. Gilbert JA, Dupont CL. Microbial metagenomics: beyond the genome. Annu Rev Mar Sci. 2011;3:347–371. doi: 10.1146/annurev-marine-120709-142811. - DOI - PubMed
    1. Ranjan R, Rani A, Metwally A, McGee HS, Perkins DL. Analysis of the microbiome: advantages of whole genome shotgun versus 16S amplicon sequencing. Biochem Biophys Res Commun. 2016;469(4):967–977. doi: 10.1016/j.bbrc.2015.12.083. - DOI - PMC - PubMed

Publication types

MeSH terms

LinkOut - more resources