Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2013;14 Suppl 3(Suppl 3):S10.
doi: 10.1186/1471-2164-14-S3-S10. Epub 2013 May 28.

GWIS--model-free, fast and exhaustive search for epistatic interactions in case-control GWAS

Affiliations
Comparative Study

GWIS--model-free, fast and exhaustive search for epistatic interactions in case-control GWAS

Benjamin Goudey et al. BMC Genomics. 2013.

Abstract

Background: It has been hypothesized that multivariate analysis and systematic detection of epistatic interactions between explanatory genotyping variables may help resolve the problem of "missing heritability" currently observed in genome-wide association studies (GWAS). However, even the simplest bivariate analysis is still held back by significant statistical and computational challenges that are often addressed by reducing the set of analysed markers. Theoretically, it has been shown that combinations of loci may exist that show weak or no effects individually, but show significant (even complete) explanatory power over phenotype when combined. Reducing the set of analysed SNPs before bivariate analysis could easily omit such critical loci.

Results: We have developed an exhaustive bivariate GWAS analysis methodology that yields a manageable subset of candidate marker pairs for subsequent analysis using other, often more computationally expensive techniques. Our model-free filtering approach is based on classification using ROC curve analysis, an alternative to much slower regression-based modelling techniques. Exhaustive analysis of studies containing approximately 450,000 SNPs and 5,000 samples requires only 2 hours using a desktop CPU or 13 minutes using a GPU (Graphics Processing Unit). We validate our methodology with analysis of simulated datasets as well as the seven Wellcome Trust Case-Control Consortium datasets that represent a wide range of real life GWAS challenges. We have identified SNP pairs that have considerably stronger association with disease than their individual component SNPs that often show negligible effect univariately. When compared against previously reported results in the literature, our methods re-detect most significant SNP-pairs and additionally detect many pairs absent from the literature that show strong association with disease. The high overlap suggests that our fast analysis could substitute for some slower alternatives.

Conclusions: We demonstrate that the proposed methodology is robust, fast and capable of exhaustive search for epistatic interactions using a standard desktop computer. First, our implementation is significantly faster than timings for comparable algorithms reported in the literature, especially as our method allows simultaneous use of multiple statistical filters with low computing time overhead. Second, for some diseases, we have identified hundreds of SNP pairs that pass formal multiple test (Bonferroni) correction and could form a rich source of hypotheses for follow-up analysis.

Availability: A web-based version of the software used for this analysis is available at http://bioinformatics.research.nicta.com.au/gwis.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Strength of SNPs individually and as pairs, and the frequency of SNPs appearing in pairs detected in RA by χ2and DSS. Manhattan plots for univariate and bivariate SNP and the frequency of SNPs occurrence in pairs detected in RA by χ2 and DSS. The Manhattan plots (a, c) show location and p-values of univariately significant SNPs (blue) and bivariately significant SNP pairs (green). Additionally, we mark the subset of SNP pairs that are also significant according to GSS (red circle). Each SNP pair generates two points. The frequency plots (b, d) show the number of reported pairs that each SNP appears in. The Manhattan plot for χ2 (a) indicates almost all reported pairs appear in two distinct bands across the genome. The frequency plot (b) indicates these pairs all involve one of the two most significant SNPs from univariate analysis (highlighted) and therefore majority of them are unlikely to be epistatic. Manhattan plot (c) shows the DSS filter eliminates the banding pattern seen for χ2 and the frequency plot (d) shows that a greater number of unique SNPs are present in detected pairs. Note in Manhattan plot (c), the p-values for univariate association are from SS test as DSS only applies to pairs. There is also no pairwise Bonferroni line shown because DSS is a heuristic rather than a calibrated p-value.
Figure 2
Figure 2
Odds ratio (OR) vs. "critical sens/spec" of detected pairs in seven WTCCC datasets in our study and reported in literature to date. Odds ratio (OR) vs. "critical sens/spec", i.e. sensitivity for contributing genotypes (log2 OR>0) or vs. specificity for protective genotypes (log2 OR<0). We show pairs from seven WTCCC datasets reported by GWIS or in previous literature. Results from GBOOST, an implementation of log-linear regression method, have been indicated by circles. Here we show all pairs from the full set of previous literature results that we have compiled. Each pair is represented by a point whose style indicates the methods it was reported by. There are nine pairs in the literature which pass the formal Bonferroni threshold for the gain test, fltGSS>log10459,012211 but were not detected by GWIS (black diamonds); the literature pairs which did not pass this formal requirement are marked by cyan diamonds. There are few pairs that were only detected by GBOOST (empty green circles). There is a substantial number of pairs with high odds ratios and coverage which were detected only by GWIS (red dots with no surrounding green circle) while many more were detected both in the literature and by GWIS (blue dots). The left most vertical dotted line marks the formal minimum requirement of critical sens/spec ≥ 2%, while such horizontal lines are for log2OR = ±1 corresponding to OR = 2 or OR = 1/2, respectively.
Figure 3
Figure 3
Power of χ2 and the proposed DSS heuristic over simulated data. These charts compare the power of χ2 and the proposed DSS heuristic to detect an epistatic pair. All DSS results are shown as solid lines; χ2 results are shown as dashed lines. Lines of same colour represent results from different statistics on the same simulated data. The effects of varying heritability, sample size (200, 400, 800, 1600) and minor allele frequency (0.2, 0.4) are shown here. Each data point shows the mean power over 500 randomly generated datasets. Across all parameter configurations DSS demonstrated higher power to detect the interacting pair of SNPs than χ2. False-positive rates for both tests (not shown here) were very low and grew linearly with the number of samples (individuals).
Figure 4
Figure 4
Example of a pair of individually insignificant SNPs in HT data. Example of a pair of individually insignificant SNPs in HT data that combined display both strong protective and contributing effects. Panel (a) shows prevalence mapping ROC curves for the pair (red) and individual SNPs (blue, green). Panels (b) and (c) zoom into the protective (top-right 20%) and contributory (lower-left 20%) areas respectively. Panel (d) shows selected statistics for the pair. The nine rows correspond to possible genotype calls for the pair of SNPs. Columns are: fltSS - the sensitivity-specificity filter score; fltGSS - the gain filter score; OR - Odds-Ratio; p0 - percentage of Controls for the genotype call in this row; p1 - percentage of Cases; spe% - specificity %; sen% - sensitivity %; RelRisk - relative risk for the genotype call := p1/p0; g1, g2 - genotype calls for the pair of SNPs; prevPerm - prevalence permutation (see Additional File Section 1.2). The genotype call (1, 0) (row 1) segregates 5.79% of Cases with only 0.24% of Controls resulting in an odds ratio of 25.73, fltSS = 37.81 and fltGSS = 33.83. Conversely, genotype calls {(1, 1), (2, 1)} (rows 8 and 9) cover 7.39% of Controls with only 0.41% of Cases resulting in an odds ratio of 0.05. This combination is also highly significant with fltSS = 39.74 and fltGSS = 35.9. The two points corresponding to these calls are highlighted with stars.
Figure 5
Figure 5
Illustration of the principles underlying the GSS and DSS filters. Illustration of the principles underlying the GSS (a) and DSS (b) filters.

Similar articles

Cited by

References

    1. Makowsky R, Pajewski NM, Klimentidis YC, Vazquez AI, Duarte CW, Allison DB, de los Campos G. Beyond missing heritability: Prediction of complex traits. PLoS Genet. 2011;7(4) - PMC - PubMed
    1. Zuk O, Hechter E, Sunyaev SR, Lander ES. The mystery of missing heritability: Genetic interactions create phantom heritability. Proc Natl Acad Sci USA. 2012;109(4):1193–1198. doi: 10.1073/pnas.1119675109. - DOI - PMC - PubMed
    1. Culverhouse R, Suarez BK, Lin J, Reich T. A perspective on epistasis: limits of models displaying no main effect. Am J Hum Genet. 2002;70(2):461–471. doi: 10.1086/338759. - DOI - PMC - PubMed
    1. Greene CS, Sinnott-Armstrong NA, Himmelstein DS, Park PJ, Moore JH, Harris BT. Multifactor dimensionality reduction for graphics processing units enables genome-wide testing of epistasis in sporadic ALS. Bioinformatics. 2010;26(5):694–695. doi: 10.1093/bioinformatics/btq009. - DOI - PMC - PubMed
    1. Ritchie MD, Hahn LW, Roodi N, Bailey LR, Dupont WD, Parl FF, Moore JH. Multifactor-dimensionality reduction reveals high-order interactions among estrogen-metabolism genes in sporadic breast cancer. Am J Hum Genet. 2001;69:138–147. doi: 10.1086/321276. - DOI - PMC - PubMed

Publication types

LinkOut - more resources