Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2022 Nov 19;23(6):bbac435.
doi: 10.1093/bib/bbac435.

A comprehensive survey of the approaches for pathway analysis using multi-omics data integration

Affiliations
Review

A comprehensive survey of the approaches for pathway analysis using multi-omics data integration

Zeynab Maghsoudi et al. Brief Bioinform. .

Abstract

Pathway analysis has been widely used to detect pathways and functions associated with complex disease phenotypes. The proliferation of this approach is due to better interpretability of its results and its higher statistical power compared with the gene-level statistics. A plethora of pathway analysis methods that utilize multi-omics setup, rather than just transcriptomics or proteomics, have recently been developed to discover novel pathways and biomarkers. Since multi-omics gives multiple views into the same problem, different approaches are employed in aggregating these views into a comprehensive biological context. As a result, a variety of novel hypotheses regarding disease ideation and treatment targets can be formulated. In this article, we review 32 such pathway analysis methods developed for multi-omics and multi-cohort data. We discuss their availability and implementation, assumptions, supported omics types and databases, pathway analysis techniques and integration strategies. A comprehensive assessment of each method's practicality, and a thorough discussion of the strengths and drawbacks of each technique will be provided. The main objective of this survey is to provide a thorough examination of existing methods to assist potential users and researchers in selecting suitable tools for their data and analysis purposes, while highlighting outstanding challenges in the field that remain to be addressed for future development.

Keywords: integrative pathway analysis; multi-cohort analysis; multi-omics integration; pathway graph transformation.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Timeline of the approaches developed for pathway analysis and multi-omics integration.
Figure 2
Figure 2
The high-level workflow of the surveyed approaches for integrative pathway analysis. The modules commonly present in every analysis pipeline are depicted as solid-line boxes. The dashed boxes are optional modules. The analysis process starts with processing input data (multiple multi-omics datasets). The methods first perform an integrity check to ensure the consistency among the input matrices and then map the gene/compound IDs to supported IDs of pathway databases. After pre-analysis, the methods perform differential analysis at the gene-level and then identify significant pathways using statistical hypothesis testing. The output often includes pathway P-values, enrichment scores and visualization plots.
Figure 3
Figure 3
Overall pipeline of gene-level, P-value-based integrative approaches. The input includes multi-omics and/or multi-cohort data that compare two phenotypes. Methods in this category first analyze each readout independently to obtain the gene-level P-values and effect sizes (e.g. log fold-change). For each gene, the methods combine the P-values and effect sizes across multiple readouts to obtain the summary P-value and effect size of the gene. Finally, these approaches perform functional analysis using the summary P-values and statistics to identify pathways that are significantly different between the two phenotypes. The output of these methods typically include the P-values and enrichment scores of the pathways.
Figure 4
Figure 4
Overall pipeline of pathway-level, P-value-based integrative approaches. These methods first analyze each readout independently to calculate the P-values and statistics of each pathway in each readout. Next, the obtained P-values and statistics across all input datasets are combined to obtain the summary P-value and effect size for each pathway.
Figure 5
Figure 5
Overall pipeline of graph-transformation-based integrative approaches. These methods first construct pathway networks and then analyze each readout independently to obtain the summary statistics for genes and other omics entities. Finally, they perform graph-based analysis to calculate the P-value and network score for each pathway.
Figure 6
Figure 6
Overall pipeline of machine-learning-based approaches. For a given pathway, these methods filter the multi-omics data to keep only genes belonging to the pathway. Next, these methods classify each sample using the expression data and assess the accuracy using the area under the receiver operating characteristic curve (AUC). The P-value of the pathway is calculated by comparing the obtained AUC to its empirical distribution constructed under the null.
Figure 7
Figure 7
Assessment of 32 surveyed methods in terms of the validation, stability, installation, user friendliness, documentation and tutorial. The score of each metric ranges from one (formula image) to five (formula image). Each metric has a different color. The methods are sorted according to their average score in an ascending order. The horizontal axis shows the average score for each method. There are 12 methods that have an average score above 4.0: ReactomeGSA, ActivePathways, PaintOmics 3, mitch, BLMA, PathwayPCA, Mergeomics, iODA, Subpathway-GM, multiGSEA, CPA and MAPE.
Figure 8
Figure 8
Three core modules of the 32 surveyed methods, their pros and cons, and overall performance score (from one to five). The network construction module represents all activities performed by each method for expanding/transforming the pathway annotation graph. The statistics computation module includes techniques employed by each method for computing statistics at the pathway-level. The score combination module includes each method’s strategy in combining the computed statistics at the gene or pathway-level. Most methods combine the statistics at the pathway-level, except those in the gene-level integration category. Methods with an asterisk (*) also support pathway-level combination.

Similar articles

Cited by

References

    1. Huang DW, Sherman BT, Lempicki RA. Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat Protoc 2009;4:44. - PubMed
    1. Dahlquist KD, Salomonis N, Vranizan K, et al. . GenMAPP, a new tool for viewing and analyzing microarray data on biological pathways. Nat Genet 2002;31:19–20. - PubMed
    1. Castillo-Davis CI, Hartl DL. GeneMerge – post-genomic analysis, data mining, and hypothesis testing. Bioinformatics 2003;19(7):891–2. - PubMed
    1. Hosack DA, Jr GD, Sherman BT, et al. . Identifying biological themes within lists of genes with EASE. Genome Biol 2003;4:R70. - PMC - PubMed
    1. Al-Shahrour F, Díaz-Uriarte R, Dopazo J. FatiGO: a web tool for finding significant associations of gene ontology terms with groups of genes. Bioinformatics 2004;20(4):578–80. - PubMed

Publication types