Waste not, want not: revisiting the analysis that called into question the practice of rarefaction
- PMID: 38054712
- PMCID: PMC10826360
- DOI: 10.1128/msphere.00355-23
Waste not, want not: revisiting the analysis that called into question the practice of rarefaction
Abstract
In 2014, McMurdie and Holmes published the provocatively titled "Waste not, want not: why rarefying microbiome data is inadmissible." The claims of their study have significantly altered how microbiome researchers control for the unavoidable uneven sequencing depths that are inherent in modern 16S rRNA gene sequencing. Confusion over the distinction between the definitions of rarefying and rarefaction continues to cloud the interpretation of their results. More importantly, the authors made a variety of problematic choices when designing and analyzing their simulations. I identified 11 factors that could have compromised the results of the original study. I reproduced the original simulation results and assessed the impact of those factors on the underlying conclusion that rarefying data is inadmissible. Throughout, the design of the original study made choices that caused rarefying and rarefaction to appear to perform worse than they truly did. Most important were the approaches used to assess ecological distances, the removal of samples with low sequencing depth, and not accounting for conditions where sequencing effort is confounded with treatment group. Although the original study criticized rarefying for the arbitrary removal of valid data, repeatedly rarefying data many times (i.e., rarefaction) incorporates all the data. In contrast, it is the removal of rare taxa that would appear to remove valid data. Overall, I show that rarefaction is the most robust approach to control for uneven sequencing effort when considered across a variety of alpha and beta diversity metrics.IMPORTANCEOver the past 10 years, the best method for normalizing the sequencing depth of samples characterized by 16S rRNA gene sequencing has been contentious. An often cited article by McMurdie and Holmes forcefully argued that rarefying the number of sequence counts was "inadmissible" and should not be employed. However, I identified a number of problems with the design of their simulations and analysis that compromised their results. In fact, when I reproduced and expanded upon their analysis, it was clear that rarefaction was actually the most robust approach for controlling for uneven sequencing effort across samples. Rarefaction limits the rate of falsely detecting and rejecting differences between treatment groups. Far from being "inadmissible", rarefaction is a valuable tool for analyzing microbiome sequence data.
Keywords: 16S rRNA gene seqeuncing; amplicon sequencing; bioinformatics; microbial ecology; microbiome.
Conflict of interest statement
The authors declare no conflict of interest.
Figures
Similar articles
-
Rarefaction is currently the best approach to control for uneven sequencing effort in amplicon sequence analyses.mSphere. 2024 Feb 28;9(2):e0035423. doi: 10.1128/msphere.00354-23. Epub 2024 Jan 22. mSphere. 2024. PMID: 38251877 Free PMC article.
-
To rarefy or not to rarefy: robustness and efficiency trade-offs of rarefying microbiome data.Bioinformatics. 2022 Apr 28;38(9):2389-2396. doi: 10.1093/bioinformatics/btac127. Bioinformatics. 2022. PMID: 35212706
-
Waste not, want not: why rarefying microbiome data is inadmissible.PLoS Comput Biol. 2014 Apr 3;10(4):e1003531. doi: 10.1371/journal.pcbi.1003531. eCollection 2014 Apr. PLoS Comput Biol. 2014. PMID: 24699258 Free PMC article.
-
Considerations and best practices in animal science 16S ribosomal RNA gene sequencing microbiome studies.J Anim Sci. 2022 Feb 1;100(2):skab346. doi: 10.1093/jas/skab346. J Anim Sci. 2022. PMID: 35106579 Free PMC article. Review.
-
Critical review of 16S rRNA gene sequencing workflow in microbiome studies: From primer selection to advanced data analysis.Mol Oral Microbiol. 2023 Oct;38(5):347-399. doi: 10.1111/omi.12434. Epub 2023 Oct 7. Mol Oral Microbiol. 2023. PMID: 37804481 Review.
Cited by
-
Species-level characterization of the core microbiome in healthy dogs using full-length 16S rRNA gene sequencing.Front Vet Sci. 2024 Sep 2;11:1405470. doi: 10.3389/fvets.2024.1405470. eCollection 2024. Front Vet Sci. 2024. PMID: 39286595 Free PMC article.
-
Longitudinal multicompartment characterization of host-microbiota interactions in patients with acute respiratory failure.Nat Commun. 2024 Jun 3;15(1):4708. doi: 10.1038/s41467-024-48819-8. Nat Commun. 2024. PMID: 38830853 Free PMC article.
-
Microbial Community Response to Granular Peroxide-Based Algaecide Treatment of a Cyanobacterial Harmful Algal Bloom in Lake Okeechobee, Florida (USA).Toxins (Basel). 2024 Apr 26;16(5):206. doi: 10.3390/toxins16050206. Toxins (Basel). 2024. PMID: 38787058 Free PMC article.
-
Interplay of biotic and abiotic factors shapes tree seedling growth and root-associated microbial communities.Commun Biol. 2024 Mar 22;7(1):360. doi: 10.1038/s42003-024-06042-7. Commun Biol. 2024. PMID: 38519711 Free PMC article.
-
Rarefaction is currently the best approach to control for uneven sequencing effort in amplicon sequence analyses.mSphere. 2024 Feb 28;9(2):e0035423. doi: 10.1128/msphere.00354-23. Epub 2024 Jan 22. mSphere. 2024. PMID: 38251877 Free PMC article.
References
-
- Kozich JJ, Westcott SL, Baxter NT, Highlander SK, Schloss PD. 2013. Development of a dual-index sequencing strategy and curation pipeline for analyzing amplicon sequence data on the MiSeq illumina sequencing platform. Appl Environ Microbiol 79:5112–5120. doi:10.1128/AEM.01043-13 - DOI - PMC - PubMed
-
- Schloss PD. 2020. Removal of rare amplicon sequence variants from 16S rRNA gene sequence surveys biases the interpretation of community structure data. bioRxiv. doi:10.1101/2020.12.11.422279 - DOI
-
- Sanders HL. 1968. Marine benthic diversity: a comparative study. Am Nat 102:243–282. doi:10.1086/282541 - DOI
MeSH terms
Substances
Grants and funding
LinkOut - more resources
Full Text Sources
Miscellaneous