Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2024 Jan 22;25(2):bbae014.
doi: 10.1093/bib/bbae014.

Network propagation for GWAS analysis: a practical guide to leveraging molecular networks for disease gene discovery

Affiliations
Review

Network propagation for GWAS analysis: a practical guide to leveraging molecular networks for disease gene discovery

Giovanni Visonà et al. Brief Bioinform. .

Abstract

Motivation: Genome-wide association studies (GWAS) have enabled large-scale analysis of the role of genetic variants in human disease. Despite impressive methodological advances, subsequent clinical interpretation and application remains challenging when GWAS suffer from a lack of statistical power. In recent years, however, the use of information diffusion algorithms with molecular networks has led to fruitful insights on disease genes.

Results: We present an overview of the design choices and pitfalls that prove crucial in the application of network propagation methods to GWAS summary statistics. We highlight general trends from the literature, and present benchmark experiments to expand on these insights selecting as case study three diseases and five molecular networks. We verify that the use of gene-level scores based on GWAS P-values offers advantages over the selection of a set of 'seed' disease genes not weighted by the associated P-values if the GWAS summary statistics are of sufficient quality. Beyond that, the size and the density of the networks prove to be important factors for consideration. Finally, we explore several ensemble methods and show that combining multiple networks may improve the network propagation approach.

Keywords: GWAS; disease gene; molecular network; network propagation.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Overview of the workflow for the application of network propagation methods to GWAS summary statistics. The analysis of GWAS summary statistics begins with the selection of a methodology to map variants to protein-coding genes. The P-values for the variants associated with each gene are aggregated to generate gene-level P-values. The scores are overlaid on a selected molecular network, and the information is diffused with a suitable propagation algorithm. A selection criterion is then employed to obtain sets of candidate disease genes from the propagated information. Each of the steps involved presents important design choices that affect the results obtained through this approach.
Figure 2
Figure 2
Comparison of RWRs based on gene-level P-values versus the use of seed genes. (A) Range of performance across diseases and networks for different values of the probability of continuing the RW. The use of seed genes is more robust to hyperparameter tuning, but gene scores outperform seed genes when selecting formula image close to 0. The shaded areas represent the 95% confidence interval. (B) The combination ASD-PCNet proves to be the only outlier where the use of seed genes clearly outperforms the P-value-based scores. (C) The performance of the RWRs shows considerable variability across disease-network combination. The ProteomeHD network, in particular, shows poor performance, likely due to the small number of available genes. (D) Results of the simulation of RWRs starting from asthma seed genes using the STRING network, for two example values of formula image. For each gene, we performed RWRs for 100 000 restarts, and considered the shortest path distances between the starting gene and the termination gene, filtering out the walks that terminate on the starting gene. The resulting histogram shows the fraction of walks that terminate on nodes that are one, two or three hops away from the starting gene. This distribution can be used as a criterion to select the hyperparameter formula image based on the desired level of exploration of the network.
Figure 3
Figure 3
Network properties affect the result of propagation algorithms. (A) Best performance of the network propagation method for each network, plotted against network size. An upward trend is apparent, suggesting that larger networks are beneficial for this analysis. (B) The performance of the network propagation methods for different densities of connections displays a peculiar pattern, suggesting that networks that are too sparse or that include too many connections may hinder the use of network propagation.
Figure 4
Figure 4
Comparison of the performance for single network RWRs to the two listed Avg methods and RWRs on a multilayer network composed by the five gene networks, represented as mAP@K with 95% confidence interval. Avg. Rank appears to offer robust performance even for high values of formula image, which enables the explorations of genes further apart from known disease genes.

Similar articles

References

    1. Ghosh R, Tabrizi SJ. Clinical features of Huntington’s disease. Adv Exp Med Biol 2018; 1049:1–28. - PubMed
    1. Hirschhorn JN, Daly MJ. Genome-wide association studies for common diseases and complex traits. Nat Rev Genet 2005; 6(2): 95–108. - PubMed
    1. Mackay TFC, Moore JH. Why epistasis is important for tackling complex human disease genetics. Genome Med 2014; 6:42–2. - PMC - PubMed
    1. Hirschhorn JN. Genetic approaches to studying common diseases and complex traits. Pediatr Res 2005; 57:74R–7. - PubMed
    1. 1000 Genome Projects Consortium, Auton A, Brooks LD, et al. . A global reference for human genetic variation. Nature 2015; 526(7571): 68. - PMC - PubMed