Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013;9(10):e1003770.
doi: 10.1371/journal.pgen.1003770. Epub 2013 Oct 3.

Integrated enrichment analysis of variants and pathways in genome-wide association studies indicates central role for IL-2 signaling genes in type 1 diabetes, and cytokine signaling genes in Crohn's disease

Affiliations

Integrated enrichment analysis of variants and pathways in genome-wide association studies indicates central role for IL-2 signaling genes in type 1 diabetes, and cytokine signaling genes in Crohn's disease

Peter Carbonetto et al. PLoS Genet. 2013.

Abstract

Pathway analyses of genome-wide association studies aggregate information over sets of related genes, such as genes in common pathways, to identify gene sets that are enriched for variants associated with disease. We develop a model-based approach to pathway analysis, and apply this approach to data from the Wellcome Trust Case Control Consortium (WTCCC) studies. Our method offers several benefits over existing approaches. First, our method not only interrogates pathways for enrichment of disease associations, but also estimates the level of enrichment, which yields a coherent way to promote variants in enriched pathways, enhancing discovery of genes underlying disease. Second, our approach allows for multiple enriched pathways, a feature that leads to novel findings in two diseases where the major histocompatibility complex (MHC) is a major determinant of disease susceptibility. Third, by modeling disease as the combined effect of multiple markers, our method automatically accounts for linkage disequilibrium among variants. Interrogation of pathways from eight pathway databases yields strong support for enriched pathways, indicating links between Crohn's disease (CD) and cytokine-driven networks that modulate immune responses; between rheumatoid arthritis (RA) and "Measles" pathway genes involved in immune responses triggered by measles infection; and between type 1 diabetes (T1D) and IL2-mediated signaling genes. Prioritizing variants in these enriched pathways yields many additional putative disease associations compared to analyses without enrichment. For CD and RA, 7 of 8 additional non-MHC associations are corroborated by other studies, providing validation for our approach. For T1D, prioritization of IL-2 signaling genes yields strong evidence for 7 additional non-MHC candidate disease loci, as well as suggestive evidence for several more. Of the 7 strongest associations, 4 are validated by other studies, and 3 (near IL-2 signaling genes RAF1, MAPK14, and FYN) constitute novel putative T1D loci for further study.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. Diseases show a wide range of support for enrichment of disease associations in pathways.
Each row shows the pathway with the largest BF for enrichment of disease associations among 3158 candidate gene sets. Columns left to right: (1) disease; (2) enriched pathway; (3) pathway database, and repository where pathway is retrieved if different from database; (4) BF for hypothesis that disease associations are enriched among SNPs assigned to pathway; (5) posterior probability of enrichment hypothesis; (6) number of genes assigned to pathway; (7) number of SNPs near these genes. Abbreviations used in figure: PID = NCI Nature Pathway Interaction Database , BS = NCBI BioSystems , PC = Pathway Commons . Databases and database identifiers for pathways listed here: “Transport of connexons to the plasma membrane” (Reactome 11050, PC); “Tumor suppressor Arf inhibits ribosomal biogenesis” (BioCarta); “Cytokine signaling in immune system” (Reactome 75790, BS 366171); “Alanine biosynthesis” (PANTHER P02724); “Measles” (KEGG hsa05162, BS 213306); “IL2-mediated signaling events” (PID il2_1pathway, BS 137976); “Incretin synthesis, secretion, and inactivation” (Reactome 23974, PC). *Null and enrichment hypotheses for RA and T1D include enrichment of disease associations in MHC, in which SNPs within MHC are enriched at a different level than non-MHC SNPs in pathway; formula image and 4.6 for RA and T1D, respectively. Number of genes/SNPs for RA and T1D count only non-MHC genes assigned to pathway. **Illustrative posterior probability assuming a “conservative” prior (see text).
Figure 2
Figure 2. Top-ranked candidate pathways for enrichment of disease associations in CD, RA, T1D and T2D.
Refer to Figure 1 for legend, abbreviations, and meaning of asterisk (*). Two right-most columns show posterior mean and 95% credible interval of genome-wide log-odds (formula image) and log-fold enrichment (formula image) given that pathway is enriched (formula image). Note that enrichment level is defined on log-scale (eq. 2), so formula image indicates enrichment. Credible interval is smallest interval about mean that contains parameter with 95% posterior probability, calculated to nearest 0.1 using a numerical approximation. Database identifiers for pathways not previously mentioned: “IL23-mediated signaling events” (PID il23pathway, PC); “IL12-mediated signaling events” (PID il12_2pathway, PC); “Immune system” (Reactome 6900, BS 106386); “Release of eIF4E” (Reactome 6836, PC); “Synthesis, secretion, and inactivation of glucagon-like peptide-1” (Reactome 24019, PC); “Id signaling pathway” (WikiPathways WP53 , BS 198871). See Figure S1 for more gene set enrichment results.
Figure 3
Figure 3. Scatterplots showing , posterior probability that region contains disease risk variants, given different enrichment hypotheses.
Each point corresponds to a small region of the genome containing 50 SNPs. Posterior probabilities on vertical axis for CD, RA and T1D are conditioned on enrichment of pathway with largest BF (Figure 1). For T2D, since no single pathway stands out in ranking (Figures 2 and S1), formula image along vertical axis is obtained by averaging over top 5 pathways (see Methods). Points highlighted in red correspond to segments overlapping SNPs assigned to the enriched pathway (for T2D, at least 1 out of 5 top pathways). In RA and T1D, 50-SNP segments overlapping the MHC are drawn as open circles (SNPs in these segments are not assigned to the pathway). Overlapping segments sharing the same association signal are not shown. Some segments are labeled by gene(s) in pathway and/or most credible gene of interest based on prior studies (most credible gene is shown in parentheses if different from pathway gene). Asterisk (*) indicates an appreciable increase in the probability of a disease association, and this association is validated by other GWAS for same disease (see Table 1).
Figure 4
Figure 4. Variants in non-MHC disease regions revealed by enriched pathways have smaller effects on disease risk.
Each point in scatterplot corresponds to a 50-SNP segment outside the MHC for which formula image. Filled circles correspond to selected regions containing disease risk factors without feedback from enriched pathways (formula image); open circles correspond to selected regions conditioned on enrichment (formula image and formula image). For each segment, minor allele frequency and posterior mean additive effect of minor allele count on log-odds of disease (“log-odds ratio”) are taken from SNP in segment with highest probability of being included in multi-marker model.
Figure 5
Figure 5. Enrichment hypotheses with multiple enriched pathways show increased support from data.
Each row gives pathway, or combination of 2 or 3 pathways, with largest BF for enrichment of disease associations. See Figure 1 for legend and abbreviations used. All enrichment hypotheses for RA and T1D shown here also include enrichment of the MHC, allowing for a different level of enrichment within the MHC. Unlike the BFs in Figures 1 and 2, BFs here are all defined relative to null hypothesis of no enrichment, so that they can be easily compared. Counts of genes and SNPs only include those that are not already assigned to other enriched pathways; for example, 37 genes belong to the IL-23 pathway, and of those 15 are already cytokine signaling genes, so inclusion of IL-23 signaling adds 22 more genes. Databases and database identifiers for pathways in this figure: “IL2-mediated signaling events” (PID il2_1pathway, BS 137976); “ErbB receptor signaling network” (PID erbb_network_pathway, BS 138016); “Inositol pyrophosphates biosynthesis” (HumanCyc 6369, PC); “Measles” (KEGG hsa05162, BS 213306); “Wnt” (Cancer Cell Map, PC); “Cytokine signaling in immune system” (Reactome 75790, BS 366171); “IL23-mediated signaling events” (PID il23pathway, BS 138000); “Methionine salvage pathway” (Reactome 75881, BS 366245).
Figure 6
Figure 6. Summary of pathways used in the analysis.
Chart on left shows number of unique gene sets obtained from the following pathway databases, included in this order: Reactome , Kyoto Encyclopedia of Genes and Genomes (KEGG) , BioCarta (www.biocarta.com), HumanCyc , , NCI Nature Pathway Interaction Database (PID) , WikiPathways , , PANTHER and Cancer Cell Map (cancer.cellmap.org). The majority of these pathways are retrieved from the Pathway Commons (PC) and NCBI BioSystems repositories. We include gene sets from both repositories when gene sets from same pathway differ (see Supplementary Materials). We include two additional gene sets for “classical” and “extended” MHC , . Right-hand chart shows gains in gene coverage by including additional databases in the analysis, where “gene coverage” is defined as any genes in reference sequence that are assigned to at least one pathway. From the total of 3160 gene sets (including MHC and ×MHC), we achieve coverage of 39% of genes in reference sequence (see Figure S6).
Figure 7
Figure 7. Top four BFs in CD for each setting of .
In each case, the 3 largest BFs correspond, in order, to Cytokine signaling in immune system, IL23-mediated signaling events, and IL12-mediated signaling events (these are the top 3 pathways for CD in Figure 2). Pathway with fourth largest BF differs across settings of formula image.

Similar articles

Cited by

References

    1. Altshuler D, Daly MJ, Lander ES (2008) Genetic mapping in human disease. Science 322: 881–888. - PMC - PubMed
    1. Frazer KA, Murray SS, Schork NJ, Topol EJ (2009) Human genetic variation and its contribution to complex traits. Nature Reviews Genetics 10: 241–251. - PubMed
    1. McCarthy MI, Abecasis GR, Cardon LR, Goldstein DB, Little J, et al. (2008) Genome-wide association studies for complex traits: consensus, uncertainty and challenges. Nature Reviews Genetics 9: 356–369. - PubMed
    1. Pearson TA, Manolio TA (2008) How to interpret a genome-wide association study. Journal of the American Medical Association 299: 1335–1344. - PubMed
    1. Abraham C, Cho JH (2009) Inflammatory bowel disease. New England Journal of Medicine 361: 2066–2078. - PMC - PubMed

Publication types

MeSH terms

Substances