Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Jun 3;21(1):225.
doi: 10.1186/s12859-020-03552-z.

An empirical Bayes approach to normalization and differential abundance testing for microbiome data

Affiliations

An empirical Bayes approach to normalization and differential abundance testing for microbiome data

Tiantian Liu et al. BMC Bioinformatics. .

Abstract

Background: Advances in DNA sequencing have offered researchers an unprecedented opportunity to better study the variety of species living in and on the human body. However, the analysis of microbiome data is complicated by several challenges. First, the sequencing depth may vary by orders of magnitude across samples. Second, species are rare and the data often contain many zeros. Third, the specimen is a fraction of the microbial ecosystem, and so the data are compositional carrying only relative information. Other characteristics of microbiome data include pronounced over-dispersion in taxon abundances, and the existence of a phylogenetic tree that relates all bacterial species. To address some of these challenges, microbiome analysis workflows often normalize the read counts prior to downstream analysis. However, there are limitations in the current literature on the normalization of microbiome data.

Results: Under the multinomial distribution for the read counts and a prior for the unknown proportions, we propose an empirical Bayes approach to microbiome data normalization. Using a tree-based extension of the Dirichlet prior, we further extend our method by incorporating the phylogenetic tree into the normalization process. We study the impact of normalization on differential abundance analysis. In the presence of tree structure, we propose a phylogeny-aware detection procedure.

Conclusions: Extensive simulations and gut microbiome data applications are conducted to demonstrate the superior performance of our empirical Bayes method over other normalization methods, and over commonly-used methods for differential abundance testing. Original R scripts are available at GitHub (https://github.com/liudoubletian/eBay).

Keywords: Bayesian shrinkage; Differentially abundant OTUs; MetagenomeSeq; Phylogeny-aware analysis; Rarefying.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1
Comparison of recall and precision with data from DM across different β. To detect differentially abundant taxa, we simulated 100 data sets from the DM model with θ=0.15 and β∈{0.01,0.15,0.2,0.25,0.3,0.35}. a and b Recall of t-test and Wilcoxon rank sum test with various normalization methods. c and d Recall and precision of DESeq2, ANCOM, metagenomeSeq, Wrench, and those of t-test and Wilcoxon rank sum test, both applied after counts were normalized by eBay
Fig. 2
Fig. 2
Comparison of recall and precision with data from DM across different θ. To detect differentially abundant taxa, we simulated 100 data sets from the DM model with β=0.25 and θ∈{0.05,0.1,0.15,0.2,0.25,0.3}. a and b Recall of t-test and Wilcoxon rank sum test with various normalization methods. c and d Recall and precision of DESeq2, ANCOM, metagenomeSeq, Wrench, and those of t-test and Wilcoxon rank sum test, both applied after counts were normalized by eBay
Fig. 3
Fig. 3
An example of a binary tree with 50 leaves and 49 internal nodes
Fig. 4
Fig. 4
Comparison of recall and precision with data generated from the gamma-Poisson model across different δ. a and b Recall and precision of DESeq2, ANCOM, metagenomeSeq, Wrench, and that of t-test, which was applied after counts were normalized by ALDEx2 or eBay
Fig. 5
Fig. 5
Comparison of recall and precision with data generated from the zero-inflated log-normal model across different δ. a and b Recall and precision of DESeq2, ANCOM, metagenomeSeq, Wrench, and that of t-test, which was applied after counts were normalized by ALDEx2 or eBay
Fig. 6
Fig. 6
Differentially abundant bacterial species between healthy children and children with SAM. a Visualization of set intersections among differential abundance testing methods in Table 2. b The number of matches between the top K taxa identified by random forests and the top K differentially abundant taxa detected by various testing methods. metaSeq: metagenomeSeq
Fig. 7
Fig. 7
Differentially abundant bacterial species between normal weight and obese individuals. Visualization of set intersections among differential abundance testing methods in Table 2

Similar articles

Cited by

References

    1. Cho I, Blaser MJ. The human microbiome: at the interface of health and disease. Nat Rev Genet. 2012;13(4):260. - PMC - PubMed
    1. Clemente JC, Ursell LK, Parfrey LW, Knight R. The impact of the gut microbiota on human health: an integrative view. Cell. 2012;148(6):1258–70. - PMC - PubMed
    1. Zhao L, Zhang F, Ding X, Wu G, Lam YY, Wang X, et al. Gut bacteria selectively promoted by dietary fibers alleviate type 2 diabetes. Science. 2018;359(6380):1151–6. - PubMed
    1. Spor A, Koren O, Ley R. Unravelling the effects of the environment and host genotype on the gut microbiome. Nat Rev Microbiol. 2011;9(4):279. - PubMed
    1. Rothschild D, Weissbrod O, Barkan E, Kurilshikov A, Korem T, Zeevi D, et al. Environment dominates over host genetics in shaping human gut microbiota. Nature. 2018;555(7695):210–15. - PubMed

LinkOut - more resources