Skip to main page content
U.S. flag

An official website of the United States government

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Sep 6;7(1):1094.
doi: 10.1038/s42003-024-06724-2.

AutoFocus: a hierarchical framework to explore multi-omic disease associations spanning multiple scales of biomolecular interaction

Affiliations

AutoFocus: a hierarchical framework to explore multi-omic disease associations spanning multiple scales of biomolecular interaction

Annalise Schweickart et al. Commun Biol. .

Abstract

Recent advances in high-throughput measurement technologies have enabled the analysis of molecular perturbations associated with disease phenotypes at the multi-omic level. Such perturbations can range in scale from fluctuations of individual molecules to entire biological pathways. Data-driven clustering algorithms have long been used to group interactions into interpretable functional modules; however, these modules are typically constrained to a fixed size or statistical cutoff. Furthermore, modules are often analyzed independently of their broader biological context. Consequently, such clustering approaches limit the ability to explore functional module associations with disease phenotypes across multiple scales. Here, we introduce AutoFocus, a data-driven method that hierarchically organizes biomolecules and tests for phenotype enrichment at every level within the hierarchy. As a result, the method allows disease-associated modules to emerge at any scale. We evaluated this approach using two datasets: First, we explored associations of biomolecules from the multi-omic QMDiab dataset (n = 388) with the well-characterized type 2 diabetes phenotype. Secondly, we utilized the ROS/MAP Alzheimer's disease dataset (n = 500), consisting of high-throughput measurements of brain tissue to explore modules associated with multiple Alzheimer's Disease-related phenotypes. Our method identifies modules that are multi-omic, span multiple pathways, and vary in size. We provide an interactive tool to explore this hierarchy at different levels and probe enriched modules, empowering users to examine the full hierarchy, delve into biomolecular drivers of disease phenotype within a module, and incorporate functional annotations.

PubMed Disclaimer

Conflict of interest statement

The authors declare the following competing interests: J.K. holds equity in Chymia LLC, owns intellectual property in PsyProtix, serves as an advisor for celeste, and is a co-founder of iollo. R.K-D. in an inventor on a series of patents on use of metabolomics for the diagnosis and treatment of CNS diseases and holds equity in Metabolon Inc., Chymia LLC and PsyProtix.

Figures

Fig. 1
Fig. 1. AutoFocus method overview.
a Conceptual depiction of applying “focus” to the biological process of carbon metabolism at different hierarchical levels. b Multiple molecular datasets with biomolecules from the same n samples are concatenated into a single matrix, accompanied by sample phenotype information, p. c Correlation coefficients between molecules are calculated to generate a correlation matrix, d Correlation coefficients are converted to distances to create a hierarchical tree of biomolecules, e Biomolecules are univariately correlated with the phenotype of interest and filtered for statistical significance, f Enrichment “peaks” are detected by performing an enrichment analysis of the “leaves” descending from each internal node, i.e., the number of significantly correlated molecules in the respective cluster. g Functional annotation and module driver analysis is performed on each enriched module.
Fig. 2
Fig. 2. Correlation values within and across datasets.
a Proportion of significant correlations between biomolecules within and across datasets. For every dataset, the proportion of significant correlation coefficients within each dataset is substantially larger than across datasets. Consequently, statistical methods that depend on correlations will be biased towards intra-dataset interactions in a multi-omics setting. b Example correlations between two molecules measured on the sample blood samples using two similar metabolomics platforms, Metabolon Plasma HD2 and Metabolon Plasma HD4. Valine on the HD2 platform correlated stronger with Leucine measured on the same platform than with Valine on the HD4 platform. This further illustrates the tendency for stronger correlations within a dataset than between datasets. c Dataset distribution in the correlation-based hierarchical structure formed on the QMDiab dataset. Strong intra-dataset correlations can be seen for lipids (brown) and to a lesser extent for proteomics (light green), as these two datasets have dense regions where they segregate from the other -omics datasets which are otherwise thought to be well integrated.
Fig. 3
Fig. 3. AutoFocus on the QMDiab dataset.
The dataset included a total of 388 samples and 5135 biomolecules from 12 datasets: 5 metabolomics platforms on plasma, 2 on urine, and 1 on saliva, 3 blood glycomics datasets and 1 blood proteomics dataset. a View of the full hierarchical structure created from the QMDiab dataset. Magenta circles at the bottom of the tree indicate significant molecules, circles within the tree indicate modules that passed the enrichment threshold. Significant molecules were dispersed throughout the leaves of the tree and enriched modules were scattered throughout the hierarchy at a wide range of heights. The high-density region of significant molecules towards the right corresponds to the largest enriched module at the highest height. Below is a zoomed view of this module, with the left sub-tree in yellow and the right sub-tree in pink. b Pie charts of the dataset and pathway makeup of the two largest modules along with their size and significant-node enrichment fraction. Pathway annotations were only available for the metabolites measured by Metabolon. c Confounder-corrected mixed graphical model of the molecules in the largest module with phenotype. The zoomed-in view is of nodes with edges to the Type 2 Diabetes phenotype which include 1,5-AG in saliva, ornithine in urine, and the CXCL12 protein, along with the confounder age and 2 unknown molecules. As these molecules are directly connected to the T2D phenotype, we mark them as statistical “drivers” of the disease in this module.
Fig. 4
Fig. 4. Results of running the AutoFocus method on the ROS/MAP dataset.
The dataset included a total of 500 samples, which contained 8193 biomolecules from a metabolomics platform a proteomics platform performed on post-mortem brain tissue. a View of the full hierarchical structure created from the ROS/MAP dataset with two phenotypes annotated and dataset distribution below. Magenta circles represent the neurofibrillary tangles phenotype, green circles represent cognitive decline, and orange circles are overlaps between the two. Significant molecules are dispersed densely throughout the tree and enriched modules are scattered throughout the hierarchy at a large range of heights. b Zoomed-in view of a metabolomics module enriched for significant hits associated with cognitive decline. This module contained metabolites related to oxidative stress and lipid peroxidation. c Zoomed in view of the largest module found in the dataset which was enriched for metabolites and proteins significantly associated with neurofibrillary tangles. d Zoomed-in view of the largest module enriched for both phenotypes with the left sub-tree (yellow) enriched for mitochondrial proteins and the right sub-tree (pink) enriched for proteins related to synaptic vesicle exocytosis and inhibitory neurotransmission.
Fig. 5
Fig. 5. Comparison of AutoFocus with MEGENA, WGCNA, and MoDentify.
Highest Jaccard similarities of clusters from MoDentify (a), MEGENA (b), and WGCNA (c) with clusters from AutoFocus hierarchy. d Proportion of significantly associated molecules in associated clusters found by the four methods. AutoFocus’ proportion increases with a more stringent threshold, surpassing the proportion in WGCNA, MoDentify and MEGENA comfortably after a threshold of 0.35.

Similar articles

References

    1. Palsson, B. & Zengler, K. The challenges of integrating multi-omic data sets. Nat. Chem. Biol.6, 10.1101/gr.107540.110 (2010). - PubMed
    1. Halama, A. et al. A roadmap to the molecular human linking multiomics with population traits and diabetes subtypes. Nat. Commun.15, 7111 (2024). - PMC - PubMed
    1. Bartel, J. et al. The Human Blood Metabolome-Transcriptome Interface. PLoS Genet.11, 10.1371/journal.pgen.1005274 (2015). - PMC - PubMed
    1. Ritchie, M. D., Holzinger, E. R., Li, R., Pendergrass, S. A. & Kim, D. Methods of integrating data to uncover genotype–phenotype interactions. Nat. Rev. Genet.16, 85–97 (2015). 10.1038/nrg3868 - DOI - PubMed
    1. Kopczynski, D. et al. Multi-OMICS: a critical technical perspective on integrative lipidomics approaches. Biochim. Biophys. Acta Mol. Cell Biol. Lipids1862, 808–811 (2017). 10.1016/j.bbalip.2017.02.003 - DOI - PubMed

Publication types