Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Aug 30;23(1):359.
doi: 10.1186/s12859-022-04897-3.

A comprehensive comparison of multilocus association methods with summary statistics in genome-wide association studies

Affiliations

A comprehensive comparison of multilocus association methods with summary statistics in genome-wide association studies

Zhonghe Shao et al. BMC Bioinformatics. .

Abstract

Background: Multilocus analysis on a set of single nucleotide polymorphisms (SNPs) pre-assigned within a gene constitutes a valuable complement to single-marker analysis by aggregating data on complex traits in a biologically meaningful way. However, despite the existence of a wide variety of SNP-set methods, few comprehensive comparison studies have been previously performed to evaluate the effectiveness of these methods.

Results: We herein sought to fill this knowledge gap by conducting a comprehensive empirical comparison for 22 commonly-used summary-statistics based SNP-set methods. We showed that only seven methods could effectively control the type I error, and that these well-calibrated approaches had varying power performance under the simulation scenarios. Overall, we confirmed that the burden test was generally underpowered and score-based variance component tests (e.g., sequence kernel association test) were much powerful under the polygenic genetic architecture in both common and rare variant association analyses. We further revealed that two linkage-disequilibrium-free P value combination methods (e.g., harmonic mean P value method and aggregated Cauchy association test) behaved very well under the sparse genetic architecture in simulations and real-data applications to common and rare variant association analyses as well as in expression quantitative trait loci weighted integrative analysis. We also assessed the scalability of these approaches by recording computational time and found that all these methods can be scalable to biobank-scale data although some might be relatively slow.

Conclusion: In conclusion, we hope that our findings can offer an important guidance on how to choose appropriate multilocus association analysis methods in post-GWAS era. All the SNP-set methods are implemented in the R package called MCA, which is freely available at https://github.com/biostatpzeng/ .

Keywords: Common and rare variant association study; Expression quantitative trait loci; Genome-wide association study; Integrative analysis; Multilocus method; P value combination method; SNP-set analysis; Summary statistics.

PubMed Disclaimer

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

Fig. 1
Fig. 1
Statistical analysis framework for the theoretical and application comparison of SNP-set based association methods with summary statistics
Fig. 2
Fig. 2
Estimated power for the seven SNP-set methods under the sparse case with a significance level α of 10−5. Here, PVE = 0.3%, 0.5% or 1% at the right side, the number of causal SNPs (prop) = 0.05, 0.20 or 0.50 on the top, the number of the total analyzed SNPs = 50, 200 or 500 on the x-axis. The power was estimated across 103 replications
Fig. 3
Fig. 3
Estimated power for the seven SNP-set methods in the case of rare variant association study under the sparse case with a significance level α of 10−5. Here, PVE = 0.3%, 0.5% or 1% at the right side, the number of causal SNPs = 0.05, 0.20 or 0.50 on the top, the number of the total analyzed SNPs = 50, 200 or 500 on the x-axis. The power was estimated across 103 replications
Fig. 4
Fig. 4
(A) Estimated power for SNP-set methods under the polygenic TWAS framework of no horizontal pleiotropy. (B) Estimated power for SNP-set methods under the TWAS polygenic framework of horizontal pleiotropy. Here, θ = 0.1 or 0.2 at the right side, the − log10(α) = 3, 4, or 5 on the top, the number of the total analyzed SNPs = 50, 200 or 500 on the x-axis. The power was estimated across 103 replications
Fig. 5
Fig. 5
Upset plot to illustrate the number of identified genes shared across distinct SNP-set methods for six psychiatric disorders (A), four plasma lipid traits (B), and nine immune-related diseases (C)

Similar articles

Cited by

References

    1. Visscher PM, Wray NR, Zhang Q, Sklar P, McCarthy MI, Brown MA, Yang J. 10 Years of GWAS discovery: biology, function, and translation. Am J Hum Genet. 2017;101(1):5–22. doi: 10.1016/j.ajhg.2017.06.005. - DOI - PMC - PubMed
    1. Klein RJ, Xu X, Mukherjee S, Willis J, Hayes J. Successes of genome-wide association studies. Cell. 2010;142(3):350–351. doi: 10.1016/j.cell.2010.07.026. - DOI - PubMed
    1. Buniello A, MacArthur JAL, Cerezo M, Harris LW, Hayhurst J, Malangone C, McMahon A, Morales J, Mountjoy E, Sollis E, et al. The NHGRI-EBI GWAS Catalog of published genome-wide association studies, targeted arrays and summary statistics 2019. Nucleic Acids Res. 2019;47(D1):D1005–D1012. doi: 10.1093/nar/gky1120. - DOI - PMC - PubMed
    1. Loos RJF. 15 years of genome-wide association studies and no signs of slowing down. Nat Commun. 2020;11(1):1–3. doi: 10.1038/s41467-020-19653-5. - DOI - PMC - PubMed
    1. Manolio TA, Collins FS, Cox NJ, Goldstein DB, Hindorff LA, Hunter DJ, McCarthy MI, Ramos EM, Cardon LR, Chakravarti A, et al. Finding the missing heritability of complex diseases. Nature. 2009;461(7265):747–753. doi: 10.1038/nature08494. - DOI - PMC - PubMed

LinkOut - more resources