Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2022 Sep 8;18(9):e1010467.
doi: 10.1371/journal.pcbi.1010467. eCollection 2022 Sep.

Investigating differential abundance methods in microbiome data: A benchmark study

Affiliations
Review

Investigating differential abundance methods in microbiome data: A benchmark study

Marco Cappellato et al. PLoS Comput Biol. .

Abstract

The development of increasingly efficient and cost-effective high throughput DNA sequencing techniques has enhanced the possibility of studying complex microbial systems. Recently, researchers have shown great interest in studying the microorganisms that characterise different ecological niches. Differential abundance analysis aims to find the differences in the abundance of each taxa between two classes of subjects or samples, assigning a significance value to each comparison. Several bioinformatic methods have been specifically developed, taking into account the challenges of microbiome data, such as sparsity, the different sequencing depth constraint between samples and compositionality. Differential abundance analysis has led to important conclusions in different fields, from health to the environment. However, the lack of a known biological truth makes it difficult to validate the results obtained. In this work we exploit metaSPARSim, a microbial sequencing count data simulator, to simulate data with differential abundance features between experimental groups. We perform a complete comparison of recently developed and established methods on a common benchmark with great effort to the reliability of both the simulated scenarios and the evaluation metrics. The performance overview includes the investigation of numerous scenarios, studying the effect on methods' results on the main covariates such as sample size, percentage of differentially abundant features, sequencing depth, feature variability, normalisation approach and ecological niches. Mainly, we find that methods show a good control of the type I error and, generally, also of the false discovery rate at high sample size, while recall seem to depend on the dataset and sample size.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. Summary of notations.
The urn represents the real microbial population where different bacteria are represented by spheres of different colours. The sequencing process can be described by sampling without replacement with a limited number of extractions (i.e. the sequencing depth).
Fig 2
Fig 2. Distribution of mean abundance in log scale for the tooth dataset.
The dotted lines identify the abundance levels limits for sampling the DA features: low (in black), medium (in blue) and high (in red).
Fig 3
Fig 3. False Positive Rate (FPR) of each differential abundance method for each dataset considered in the comparison in the scenario without simulated DA features.
In each set of boxes corresponding to the dataset, tools are on rows, while different sample size (SS) values are on columns. The FPR values are averaged over the 50 simulations and the bars show the standard error. The ANCOM** label refers to the method run without performing the underlying FDR adjustment.
Fig 4
Fig 4. False Discovery Rate (FDR) of each differential abundance method for each dataset considered in the comparison in the main scenario with simulated DA features.
In each set of boxes corresponding to the dataset, different percentages (P) of simulated DA features are on rows, while different sample size (SS) values are on columns. The FDR values are averaged over the 50 simulations and the bars show the standard error. The number of runs that provide a defined value of FDR is shown at the beginning of the bars.
Fig 5
Fig 5. Mean Recall (on y axis) and FDR (on x axis) of each differential abundance method for each dataset considered in the comparison in the main scenario with simulated DA features.
In each set of boxes corresponding to the dataset, different percentages (P) of simulated DA features are on rows, while different sample size (SS) values are on columns. The recall values are averaged over the 50 simulations and the bars show the standard error.
Fig 6
Fig 6. Area Under Precision-Recall curves (AUPR) of each differential abundance method for each dataset considered in the comparison in the main scenario with simulated DA features.
In each set of boxes corresponding to the dataset, different percentages (P) of simulated DA features are on rows, while different sample size (SS) values are on columns. The AUPR values are averaged over the 50 simulations and the bars show the standard error.
Fig 7
Fig 7. Recall of each differential abundance method for each dataset considered in the comparison in simulations with reduced variability.
In each set of boxes corresponding to the dataset, different percentages (P) of simulated DA features are on rows, while different sample size (SS) values are on columns. The recall values are averaged over the 50 simulations and the bars show the standard error.
Fig 8
Fig 8. FDR of each differential abundance method for each dataset considered in the comparison in simulations with reduced variability.
In each set of boxes corresponding to the dataset, different percentages (P) of simulated DA features are on rows, while different sample size (SS) values are on columns. The FDR values are averaged over the 50 simulations and the bars show the standard error. The difference in the mean FDR between the scenario with reduced variability and the main scenario with simulated DA features is shown at the beginning of the bars.
Fig 9
Fig 9. Recall of each differential abundance method for each dataset considered in the comparison in the scenario with simulated DA features and θ = 0.
In each set of boxes corresponding to the dataset, different percentages (P) of simulated DA features are on rows, while different sample size (SS) values are on columns. The recall values are averaged over the 50 simulations and the bars show the standard error.
Fig 10
Fig 10. Mean FDR difference [%] between each differential abundance method and its GMPR normalised version for tooth dataset in the scenario with simulated DA features.
Different percentages (P) of simulated DA features are on rows, while different sample size (SS) values are on columns. Numbers at the beginning of each row correspond to the FDR values obtained with default normalization, while the symbol (*) identifies that the Wilcoxon unpaired statistical test is significant.
Fig 11
Fig 11. Mean Recall difference [%] between each differential abundance method and its GMPR normalised version for tooth dataset in the scenario with simulated DA features.
Different percentages (P) of simulated DA features are on rows, while different sample size (SS) values are on columns. Numbers at the beginning of each row correspond to the Recall values obtained with default normalization, while the symbol (*) identifies that the Wilcoxon paired statistical test is significant.
Fig 12
Fig 12. Overall performance of each DA method.
In each set of boxes corresponding to different sample size (SS) values, Precision, NA_perc (percentage of available precision), Recall and pAUPR scores are shown for each dataset in columns. Methods (on rows) are ranked based on Precision values across all the SS scenarios and then based on Recall in case of ties. The legend below the boxes explains the threshold used to assign the overall score for each metric.

Similar articles

Cited by

References

    1. Riquelme E, Zhang Y, Zhang L, Montiel M, Zoltan M, Dong W, et al.. Tumor Microbiome Diversity and Composition Influence Pancreatic Cancer Outcomes. Cell. 2019. Aug;178(4):795–806.e12. doi: 10.1016/j.cell.2019.07.008 - DOI - PMC - PubMed
    1. Daisley BA, Chanyi RM, Abdur-Rashid K, Al KF, Gibbons S, Chmiel JA, et al.. Abiraterone acetate preferentially enriches for the gut commensal Akkermansia muciniphila in castrate-resistant prostate cancer patients. Nat Commun. 2020. Sep;11(1):4822. doi: 10.1038/s41467-020-18649-5 - DOI - PMC - PubMed
    1. Berbers R-M, Mohamed Hoesein FAA, Ellerbroek PM, van Montfrans JM, Dalm VASH, van Hagen PM, et al.. Low IgA Associated With Oropharyngeal Microbiota Changes and Lung Disease in Primary Antibody Deficiency. Front Immunol. 2020. Jun;11:1245. doi: 10.3389/fimmu.2020.01245 - DOI - PMC - PubMed
    1. Edslev SM, Olesen CM, Nørreslet LB, Ingham AC, Iversen S, Lilje B, et al.. Staphylococcal Communities on Skin Are Associated with Atopic Dermatitis and Disease Severity. Microorganisms. 2021. Feb;9(2):432. doi: 10.3390/microorganisms9020432 - DOI - PMC - PubMed
    1. Calle ML. Statistical Analysis of Metagenomics Data. Genomics Inform. 2019;17(1):e6. doi: 10.5808/GI.2019.17.1.e6 - DOI - PMC - PubMed

MeSH terms

Grants and funding

This work has been supported by the SEED Project "tRajectoriEs of baCtErial NeTwoRks from hEalthy to disease state and back (RECENTRE)" funded by the Department of Information Engineering of the University of Padova, Grants nr. DI_C_BIRD2020_01 (BDC). G.B. was founded by PON 'Ricerca e Innovazione' 2014-2020. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.