Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Aug 15;34(16):2797-2807.
doi: 10.1093/bioinformatics/bty204.

Pleiotropic mapping and annotation selection in genome-wide association studies with penalized Gaussian mixture models

Affiliations

Pleiotropic mapping and annotation selection in genome-wide association studies with penalized Gaussian mixture models

Ping Zeng et al. Bioinformatics. .

Abstract

Motivation: Genome-wide association studies (GWASs) have identified many genetic loci associated with complex traits. A substantial fraction of these identified loci is associated with multiple traits-a phenomena known as pleiotropy. Identification of pleiotropic associations can help characterize the genetic relationship among complex traits and can facilitate our understanding of disease etiology. Effective pleiotropic association mapping requires the development of statistical methods that can jointly model multiple traits with genome-wide single nucleic polymorphisms (SNPs) together.

Results: We develop a joint modeling method, which we refer to as the integrative MApping of Pleiotropic association (iMAP). iMAP models summary statistics from GWASs, uses a multivariate Gaussian distribution to account for phenotypic correlation, simultaneously infers genome-wide SNP association pattern using mixture modeling and has the potential to reveal causal relationship between traits. Importantly, iMAP integrates a large number of SNP functional annotations to substantially improve association mapping power, and, with a sparsity-inducing penalty, is capable of selecting informative annotations from a large, potentially non-informative set. To enable scalable inference of iMAP to association studies with hundreds of thousands of individuals and millions of SNPs, we develop an efficient expectation maximization algorithm based on an approximate penalized regression algorithm. With simulations and comparisons to existing methods, we illustrate the benefits of iMAP in terms of both high association mapping power and accurate estimation of genome-wide SNP association patterns. Finally, we apply iMAP to perform a joint analysis of 48 traits from 31 GWAS consortia together with 40 tissue-specific SNP annotations generated from the Roadmap Project.

Availability and implementation: iMAP is freely available at http://www.xzlab.org/software.html.

Supplementary information: Supplementary data are available at Bioinformatics online.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Comparison of power in detecting associated SNPs by various methods in simulations. In all simulations, the proportion of pleiotropic causal SNPs varies from 0% to 100% (x-axis). Power is measured at a fixed FDR of 0.05. (A) Power of four methods (univariate analysis, gwas-pw, GPA and iMAP) in the setting where the two traits are positively correlated. For each method, the four boxplots at each pleiotropic proportion level correspond to four different phenotypic covariance values of 0, 0.2, 0.5 and 0.8, respectively. (B) Power gain of iMAP with respect to GPA computed based on panel (A). (C) Power of four methods (univariate analysis, gwas-pw, GPA and iMAP) in the presence of informative annotations. Variations of GPA and iMAP that incorporate a different number of annotations (0, 1, 2 and 4) are also presented. (D) Power gain of iMAP with respect to GPA computed based on panel (C). (E) Power of GPA and iMAP in the presence of four informative annotations and 100 noninformative annotations. Different variations of GPA and iMAP are considered: the naïve version does not incorporate any annotations; the full version includes all the annotations; the select version performs annotation selection; and the oracle version uses the four informative annotations. (F) Power gain of iMAP with respect to GPA computed based on panel (E)
Fig. 2.
Fig. 2.
Estimation accuracy for the proportions of different SNP association categories by different methods in simulations. Methods for comparison include gwas-pw, GPA and iMAP. Because four informative annotations are present, we considered variations of GPA and iMAP that incorporate a different number of annotations (0, 1, 2 and 4). The difference between the estimated values and truth (y-axis) are computed for various settings where the proportion of pleiotropic causal SNPs varies from 0% to 100% (x-axis). Different quantities of interest are considered: (A) π11, the proportion of SNPs associated with both traits; (B) π10, the proportion of SNPs associated with only the first trait; (C) π11/(π1110), the proportion of SNPs associated with the first trait that are also associated with the second trait; (D) estimated parameters for the informative annotations
Fig. 3.
Fig. 3.
Power comparison among methods for the 1128 trait pairs analyzed in the real data application. Methods for comparison include: univariate, gwas-pw, GPA-select, iMAP-naïve and iMAP. (A) Boxplots show the number of associated SNPs that pass the genome-wide significance threshold identified by various methods across 1128 trait pairs. The number of associated SNPs identified by iMAP for each trait pair is also plotted against that identified by (B) univariate, (C) gwas-pw, (D) GPA-select and (E) iMAP-naïve
Fig. 4.
Fig. 4.
Annotation selection in the real data application. (A) Estimated annotation effect sizes for the selected annotations across all analyzed trait pairs. On top of the boxplots lists the percentage of times a histone annotation is selected across these trait pairs. (B) Relevance score for each annotation (rows) across all traits (columns). The relevance score quantifies the importance of an annotation for a particular trait of interest and is computed based on analysis of all trait pairs
Fig. 5.
Fig. 5.
Estimated probability that a SNP associated with one trait (y-axis) is also associated with the other trait (x-axis), for 48 trait pairs in the real data application. Results are based on iMAP that performs annotation selection among 40 annotations

Similar articles

Cited by

References

    1. Adzhubei I. et al. (2013) Predicting functional effect of human missense mutations using PolyPhen-2 In: Jonathan,L. (eds) Current Protocols in Human Genetics. John Wiley & Sons, Inc, New York. - PMC - PubMed
    1. Andreassen O.A. et al. (2014) Identifying common genetic variants in blood pressure due to polygenic pleiotropy with associated phenotypes. Hypertension, 63, 819–826. - PMC - PubMed
    1. Banda Y. et al. (2015) Characterizing race/ethnicity and genetic ancestry for 100,000 subjects in the genetic epidemiology research on adult health and aging (GERA) cohort. Genetics, 200, 1285–1295. - PMC - PubMed
    1. Benjamini Y., Hochberg Y. (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. B, 57, 289–300.
    1. Bjornsson E. et al. (2017) A rare splice donor mutation in the haptoglobin gene associates with blood lipid levels and coronary artery disease. Hum. Mol. Genet., 26, 2364–2376. - PubMed

Publication types