Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 Oct 30;10(10):e1004722.
doi: 10.1371/journal.pgen.1004722. eCollection 2014 Oct.

Integrating functional data to prioritize causal variants in statistical fine-mapping studies

Affiliations

Integrating functional data to prioritize causal variants in statistical fine-mapping studies

Gleb Kichaev et al. PLoS Genet. .

Abstract

Standard statistical approaches for prioritization of variants for functional testing in fine-mapping studies either use marginal association statistics or estimate posterior probabilities for variants to be causal under simplifying assumptions. Here, we present a probabilistic framework that integrates association strength with functional genomic annotation data to improve accuracy in selecting plausible causal variants for functional validation. A key feature of our approach is that it empirically estimates the contribution of each functional annotation to the trait of interest directly from summary association statistics while allowing for multiple causal variants at any risk locus. We devise efficient algorithms that estimate the parameters of our model across all risk loci to further increase performance. Using simulations starting from the 1000 Genomes data, we find that our framework consistently outperforms the current state-of-the-art fine-mapping methods, reducing the number of variants that need to be selected to capture 90% of the causal variants from an average of 13.3 to 10.4 SNPs per locus (as compared to the next-best performing strategy). Furthermore, we introduce a cost-to-benefit optimization framework for determining the number of variants to be followed up in functional assays and assess its performance using real and simulation data. We validate our findings using a large scale meta-analysis of four blood lipids traits and find that the relative probability for causality is increased for variants in exons and transcription start sites and decreased in repressed genomic regions at the risk loci of these traits. Using these highly predictive, trait-specific functional annotations, we estimate causality probabilities across all traits and variants, reducing the size of the 90% confidence set from an average of 17.5 to 13.5 variants per locus in this data.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. Illustration of model input.
PAINTOR is a statistical model for incorporating functional annotations on top of association statistics to ascribe probabilistic confidence of causality to the SNPs at the loci. Depicted here are two loci with functional annotations from three different cell lines/tissues and three different classes. Causal variants are enriched within the green annotation class while depleted from others. PAINTOR is designed to upweight (with probability mass) SNPs residing in the green annotation while down-weighting SNPs residing in the red annotation.
Figure 2
Figure 2. PAINTOR outperforms existing methodologies for fine-mapping.
We simulated datasets consisting of 10 K genotypes over one hundred 10 KB loci using three synthetic functional annotations randomly dispersed at fixed percentages (2.2%, 2.2%, 30.7%). SNPs falling within these annotations were enriched (9.5, 5.7, 3.65) times more with causal variants relative to unannotated SNPs. We fixed the variance explained by these loci to formula image and repeated the simulation 500 times. The top figure corresponds to the overall performance at causal loci (64 loci) with PAINTOR clearly achieving the greatest overall accuracy. The bottom figures correspond to loci with a single causal variant (an average of 34 per simulation) (left) or multiple causal variants (average of 30 per simulation) (right). At loci where there is one true causal variant, fgwas achieves greater accuracy than PAINTOR due to the fact that fgwas assumes the correct number of causal variants. We note that the version of PAINTOR that assumes a single causal variant yields very similar to fgwas at loci where the truth is of a single causal (both requiring 2.63 SNPs per locus to identify 90% of the causal variants.) However, at loci with multiple causal variants, the power of methods that assume a single causal is greatly deflated leading to PAINTOR's superior overall accuracy.
Figure 3
Figure 3. Accuracy of enrichment estimation for a synthetic annotation that contains 8-fold depletion to 8-fold enrichment of causal variants across simulations of fine-mapping data sets over 100 loci.
Using a background and a synthetic functional annotation at a frequency of 1/3 (formula image), we simulated with annotation effect sizes such that in expectation, we attained approximately 100 causal variants while maintaining enrichment at a fixed point. We used the standard simulation parameters, fixing the variance explained by these 100 loci to 0.25 and using formula image genotypes. We discarded simulations where fgwas failed to converge (see Methods). Displayed here are the mean inferred Log2 enrichment estimates (formula image 1 SD) that were conducted over 500 independent simulations at each enrichment level.
Figure 4
Figure 4. Thresholding on posterior probabilities provides a principled way to assess utility.
We demonstrate how utility curves are optimized by selecting SNPs that achieve a minimum posterior probability threshold at various benefit-to-cost ratios (R). The total number of SNPs selected at the maximum utility for R =  (1.25, 1.5, 2, 5, 10, 20) is (29.8, 39.2, 52.4, 119.1, 221.4, 405.4) which identifies approximately (29.8, 35.6, 43.4, 65.33, 79.9, 91.8) causal variants.

Similar articles

Cited by

References

    1. Hindorff LA, Sethupathy P, Junkins HA, Ramos EM, Mehta JP, et al. (2009) Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proceedings of the National Academy of Sciences 106: 9362–9367. - PMC - PubMed
    1. Meyer KB, OReilly M, Michailidou K, Carlebur S, Edwards SL, et al. (2013) Fine-scale mapping of the fgfr2 breast cancer risk locus: putative functional variants differentially bind foxa1 and e2f1. The American Journal of Human Genetics 93: 1046–1060. - PMC - PubMed
    1. Kote-Jarai Z, Saunders EJ, Leongamornlert DA, Tymrakiewicz M, Dadaev T, et al. (2013) Fine-mapping identifies multiple prostate cancer risk loci at 5p15, one of which associates with tert expression. Human molecular genetics 22: 2520–2528. - PMC - PubMed
    1. Wu Y, Waite LL, Jackson AU, Sheu WH, Buyske S, et al. (2013) Trans-ethnic fine-mapping of lipid loci identifies population-specific signals and allelic heterogeneity that increases the trait variance explained. PLoS genetics 9: e1003379. - PMC - PubMed
    1. Maller JB, McVean G, Byrnes J, Vukcevic D, Palin K, et al. (2012) Bayesian refinement of association signals for 14 loci in 3 common diseases. Nature genetics 44: 1294–1301. - PMC - PubMed

Publication types