Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Review Article
  • Published:

Statistical power and significance testing in large-scale genetic studies

Key Points

  • Significance testing, with appropriate multiple testing correction, is currently the most convenient method for summarizing the evidence for association between a disease and a genetic variant.

  • Inadequate statistical power increases not only the probability of missing genuine associations but also the probability that significant associations represent false-positive findings.

  • Statistical power declines rapidly with decreasing allele frequency and effect size, but it can be enhanced by increasing sample size and by selecting appropriate subjects (for example, family history positive cases and 'super normal' controls).

  • Exome sequencing studies can often identify the mutation responsible for a Mendelian disease by filtering out common variants, synonymous variants or variants that do not co-segregate with disease, and then assigning priority to the remaining variants using bioinformatic tools.

  • Adequate statistical power for rare-variant association analyses in complex diseases requires the aggregation of the effects of multiple rare variants within a defined portion of the genome (for example, a set of related genes).

  • Various computational tools are available for calculating the statistical power of genetic studies.

Abstract

Significance testing was developed as an objective method for summarizing statistical evidence for a hypothesis. It has been widely adopted in genetic studies, including genome-wide association studies and, more recently, exome sequencing studies. However, significance testing in both genome-wide and exome-wide studies must adopt stringent significance thresholds to allow multiple testing, and it is useful only when studies have adequate statistical power, which depends on the characteristics of the phenotype and the putative genetic variant, as well as the study design. Here, we review the principles and applications of significance testing and power calculation, including recently proposed gene-based tests for rare variants.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Posterior probability of H0 given the critical significance level and the statistical power of a study, for different prior probabilities of H0.

Similar content being viewed by others

References

  1. Fisher, R. A. Statistical Methods for Research Workers (Oliver and Boyd, 1925).

    Google Scholar 

  2. Neyman, J. & Pearson, E. S. On the problem of the most efficient tests of statistical hypotheses. Phil. Trans. R. Soc. Lond. A 231, 289–337 (1933).

    Article  Google Scholar 

  3. Nickerson, R. S. Null hypothesis significance testing: a review of an old and continuing controversy. Psychol. Methods 5, 241–301 (2000).

    Article  CAS  PubMed  Google Scholar 

  4. Balding, D. J. A tutorial on statistical methods for population association studies. Nature Rev. Genet. 7, 781–791 (2006).

    Article  CAS  PubMed  Google Scholar 

  5. Stephens, M. & Balding, D. J. Bayesian statistical methods for genetic association studies. Nature Rev. Genet. 10, 681–690 (2009). This is a highly readable account of Bayesian approaches for the analysis of genetic association studies.

    Article  CAS  PubMed  Google Scholar 

  6. Hirschhorn, J. N., Lohmueller, K., Byrne, E. & Hirschhorn, K. A comprehensive review of genetic association studies. Genet. Med. 4, 45–61 (2002).

    Article  CAS  PubMed  Google Scholar 

  7. Ioannidis, J. P. A. Genetic associations: false or true? Trends Mol. Med. 9, 135–138 (2003).

    Article  PubMed  Google Scholar 

  8. McCarthy, M. I. et al. Genome-wide association studies for complex traits: consensus, uncertainty and challenges. Nature Rev. Genet. 9, 356–369 (2008).

    Article  CAS  PubMed  Google Scholar 

  9. The International HapMap Consortium. A haplotype map of the human genome. Nature 437, 1299–1320 (2005).

  10. Hirschhorn, J. N. & Daly, M. J. Genome-wide association studies for common diseases and complex traits. Nature Rev. Genet. 6, 95–108 (2005).

    Article  CAS  PubMed  Google Scholar 

  11. Wang, W. Y. S., Barratt, B. J., Clayton, D. G. & Todd, J. A. Genome-wide association studies: theoretical and practical concerns. Nature Rev. Genet. 6, 109–118 (2005).

    Article  CAS  PubMed  Google Scholar 

  12. Purcell, S. et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81, 559–575 (2007).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Price, A. L. et al. Principal components analysis corrects for stratification in genome-wide association studies. Nature Genet. 38, 904–909 (2006).

    Article  CAS  PubMed  Google Scholar 

  14. Pe'er, I., Yelensky, R., Altshuler, D. & Daly, M. J. Estimation of the multiple testing burden for genomewide association studies of nearly all common variants. Genet. Epidemiol. 32, 381–385 (2008).

    Article  PubMed  Google Scholar 

  15. Dudbridge, F. & Gusnanto, A. Estimation of significance thresholds for genomewide association scans. Genet. Epidemiol. 32, 227–234 (2008).

    Article  PubMed  PubMed Central  Google Scholar 

  16. Hoggart, C. J., Clark, T. G., De Iorio, M., Whittaker, J. C. & Balding, D. J. Genome-wide significance for dense SNP and resequencing data. Genet. Epidemiol. 32, 179–185 (2008).

    Article  PubMed  Google Scholar 

  17. Voight, B. F. et al. The metabochip, a custom genotyping array for genetic studies of metabolic, cardiovascular, and anthropometric traits. PLoS Genet. 8, e1002793 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Juran, B. D. et al. Immunochip analyses identify a novel risk locus for primary biliary cirrhosis at 13q14, multiple independent associations at four established risk loci and epistasis between 1p31 and 7q32 risk variants. Hum. Mol. Genet. 21, 5209–5221 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Duggal, P., Gillanders, E. M., Holmes, T. N. & Bailey-Wilson, J. E. Establishing an adjusted p-value threshold to control the family-wide type 1 error in genome wide association studies. BMC Genomics 9, 516 (2008).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  20. Nyholt, D. R. A simple correction for multiple testing for single-nucleotide polymorphisms in linkage disequilibrium with each other. Am. J. Hum. Genet. 74, 765–769 (2004).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Galwey, N. W. A new measure of the effective number of tests, a practical tool for comparing families of non-independent significance tests. Genet. Epidemiol. 33, 559–568 (2009).

    Article  PubMed  Google Scholar 

  22. Li, J. & Ji, L. Adjusting multiple testing in multilocus analyses using the eigenvalues of a correlation matrix. Heredity 95, 221–227 (2005).

    Article  CAS  PubMed  Google Scholar 

  23. Moskvina, V. & Schmidt, K. M. On multiple-testing correction in genome-wide association studies. Genet. Epidemiol. 32, 567–573 (2008).

    Article  PubMed  Google Scholar 

  24. Li, M. X., Yeung, J. M. Y., Cherny, S. S. & Sham, P. C. Evaluating the effective number of independent tests and significant p-value thresholds in commercial genotyping arrays and public imputation reference datasets. Hum. Genet. 131, 747–756 (2012).

    Article  CAS  PubMed  Google Scholar 

  25. North, B. V., Curtis, D. & Sham, P. C. A note on the calculation of empirical P values from Monte Carlo procedures. Am. J. Hum. Genet. 71, 439–441 (2002).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. North, B. V., Curtis, D. & Sham, P. C. A note on calculation of empirical P values from Monte Carlo procedure. Am. J. Hum. Genet. 72, 498–499 (2003).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Dudbridge, F. & Koeleman, B. P. C. Efficient computation of significance levels for multiple associations in large studies of correlated data, including genomewide association studies. Am. J. Hum. Genet. 75, 424–435 (2004).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Seaman, S. R. & Müller-Myhsok, B. Rapid simulation of P values for product methods and multiple-testing adjustment in association studies. Am. J. Hum. Genet. 76, 399–408 (2005).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Wacholder, S., Chanock, S., Garcia-Closas, M., El ghormli, L. & Rothman, N. Assessing the probability that a positive report is false: an approach for molecular epidemiology studies. J. Natl Cancer Inst. 96, 434–442 (2004).

    Article  PubMed  PubMed Central  Google Scholar 

  30. Panagiotou, O. A., Ioannidis, J. P. & Genome-Wide Significance Project. What should the genome-wide significance threshold be? Empirical replication of borderline genetic associations. Int. J. Epidemiol. 41, 273–286 (2011).

    Article  PubMed  Google Scholar 

  31. Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. Series B Stat. Methodol. 57, 289–300 (1995).

    Google Scholar 

  32. Wakefield, J. Bayes factors for genome-wide association studies: comparison with P-values. Genet. Epidemiol. 33, 79–86 (2009).

    Article  PubMed  Google Scholar 

  33. Visscher, P. M., Brown, M. A., McCarthy, M. I. & Yang, J. Five years of GWAS discovery. Am. J. Hum. Genet. 90, 7–24 (2012). This paper summarizes and interprets GWAS findings on common diseases and quantitative traits.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  34. Pawitan, Y., Seng, K. C. & Magnusson, P. K. E. How many genetic variants remain to be discovered? PLoS ONE 4, e7969 (2009).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  35. Purcell, S., Cherny, S. S. & Sham, P. C. Genetic Power Calculator: design of linkage and association genetic mapping studies of complex traits. Bioinformatics 19, 149–150 (2003).

    Article  CAS  PubMed  Google Scholar 

  36. Ioannidis, J. P. A. Why most discovered true associations are inflated. Epidemiology 19, 640–648 (2008).

    Article  PubMed  Google Scholar 

  37. Zhong, H. & Prentice, R. L. Bias-reduced estimators and confidence intervals for odds ratios in genome-wide association studies. Biostatistics 9, 621–634 (2008).

    Article  PubMed  PubMed Central  Google Scholar 

  38. Ghosh, A., Zou, F. & Wright, F. A. Estimating odds ratios in genome scans: an approximate conditional likelihood approach. Am. J. Hum. Genet. 82, 1064–1074 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  39. Zollner, S. & Pritchard, J. K. Overcoming the winner's curse: estimating penetrance parameters from case–control data. Am. J. Hum. Genet. 80, 605–615 (2007).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  40. Sham, P. C., Cherny, S. S., Purcell, S. & Hewitt, J. K. Power of linkage versus association analysis of quantitative traits, by use of variance-components models, for sibship data. Am. J. Hum. Genet. 66, 1616–1630 (2000).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  41. Pirinen, M., Donnelly, P. & Spencer, C. C. A. Including known covariates can reduce power to detect genetic effects in case–control studies. Nature Genet. 44, 848–851 (2012).

    Article  CAS  PubMed  Google Scholar 

  42. Li, Q., Zheng, G., Li, Z. & Yu, K. Efficient approximation of P-value of the maximum of correlated tests, with applications to genome-wide association studies. Ann. Hum. Genet. 72, 397–406 (2008).

    Article  PubMed  Google Scholar 

  43. González, J. R. et al. Maximizing association statistics over genetic models. Genet. Epidemiol. 32, 246–254 (2008).

    Article  PubMed  Google Scholar 

  44. So, H.-C. & Sham, P. C. Robust association tests under different genetic models, allowing for binary or quantitative traits and covariates. Behav. Genet. 41, 768–775 (2011).

    Article  PubMed  PubMed Central  Google Scholar 

  45. Bamshad, M. J. et al. Exome sequencing as a tool for Mendelian disease gene discovery. Nature Rev. Genet. 12, 745–755 (2011).

    Article  CAS  PubMed  Google Scholar 

  46. Kiezun, A. et al. Exome sequencing and the genetic basis of complex traits. Nature Genet. 44, 623–630 (2012).

    Article  CAS  PubMed  Google Scholar 

  47. Kryukov, G. V., Pennacchio, L. A. & Sunyaev, S. R. Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. Am. J. Hum. Genet. 80, 727–739 (2007).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  48. Nelson, M. R. et al. An abundance of rare functional variants in 202 drug target genes sequenced in 14,002 people. Science 337, 100–104 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  49. Fu, W. et al. Analysis of 6,515 exomes reveals the recent origin of most human protein-coding variants. Nature 493, 216–220 (2013).

    Article  CAS  PubMed  Google Scholar 

  50. Li, B. & Leal, S. M. Discovery of rare variants via sequencing: implications for the design of complex trait association studies. PLoS Genet. 5, e1000481 (2009).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  51. Liu, D. J. & Leal, S. M. Replication strategies for rare variant complex trait association studies via next-generation sequencing. Am. J. Hum. Genet. 87, 790–801 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  52. Li, M. X., Gui, H. S., Kwan, J. S. H., Bao, S. Y. & Sham, P. C. A comprehensive framework for prioritizing variants in exome sequencing studies of Mendelian diseases. Nucleic Acids Res. 40, e53 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  53. Ng, S. B. et al. Exome sequencing identifies MLL2 mutations as a cause of Kabuki syndrome. Nature Genet. 42, 790–793 (2010).

    Article  CAS  PubMed  Google Scholar 

  54. Zhi, D. & Chen, R. Statistical guidance for experimental design and data analysis of mutation detection in rare monogenic mendelian diseases by exome sequencing. PLoS ONE 7, e31358 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  55. Feng, B.-J., Tavtigian, S. V., Southey, M. C. & Goldgar, D. E. Design considerations for massively parallel sequencing studies of complex human disease. PLoS ONE 6, e23221 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  56. Li, B. & Leal, S. M. Methods for detecting associations with rare variants for common diseases: application to analysis of sequence data. Am. J. Hum. Genet. 83, 311–321 (2008). This is one of the first association tests for rare variants.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  57. Madsen, B. E. & Browning, S. R. A groupwise association test for rare mutations using a weighted sum statistic. PLoS Genet. 5, e1000384 (2009).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  58. Price, A. L. et al. Pooled association tests for rare variants in exon-resequencing studies. Am. J. Hum. Genet. 86, 982 (2010).

    Article  CAS  PubMed Central  Google Scholar 

  59. Lin, D.-Y. & Tang, Z.-Z. A general framework for detecting disease associations with rare variants in sequencing studies. Am. J. Hum. Genet. 89, 354–367 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  60. Bansal, V., Libiger, O., Torkamani, A. & Schork, N. J. Statistical analysis strategies for association studies involving rare variants. Nature Rev. Genet. 11, 773–785 (2010).

    Article  CAS  PubMed  Google Scholar 

  61. Stitziel, N. O., Kiezun, A. & Sunyaev, S. Computational and statistical approaches to analyzing variants identified by exome sequencing. Genome Biol. 12, 227 (2011).

    Article  PubMed  PubMed Central  Google Scholar 

  62. Basu, S. & Pan, W. Comparison of statistical tests for disease association with rare variants. Genet. Epidemiol. 35, 606–619 (2011).

    Article  PubMed  PubMed Central  Google Scholar 

  63. Ladouceur, M., Dastani, Z., Aulchenko, Y. S., Greenwood, C. M. T. & Richards, J. B. The empirical power of rare variant association methods: results from Sanger sequencing in 1,998 individuals. PLoS Genet. 8, e1002496 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  64. Ladouceur, M., Zheng, H.-F., Greenwood, C. M. T. & Richards, J. B. Empirical power of very rare variants for common traits and disease: results from Sanger sequencing 1998 individuals. Eur. J. Hum. Genet. 21, 1027–1030 (2013).

    Article  PubMed  PubMed Central  Google Scholar 

  65. Saad, M., Pierre, A. S., Bohossian, N., Macé, M. & Martinez, M. Comparative study of statistical methods for detecting association with rare variants in exome-resequencing data. BMC Proc. 5, S33 (2011).

    Article  PubMed  PubMed Central  Google Scholar 

  66. Neale, B. M. et al. Testing for an unusual distribution of rare variants. PLoS Genet. 7, e1001322 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  67. Wu, Michael, C. et al. Rare-variant association testing for sequencing data with the sequence kernel association test. Am. J. Hum. Genet. 89, 82–93 (2011). This is the original paper that describes the SKAT for rare-variant association.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  68. Liu, L. et al. Analysis of rare, exonic variation amongst subjects with autism spectrum disorders and population controls. PLoS Genet. 9, e1003443 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  69. Zuk, O. et al. Searching for missing heritability: Designing rare variant association studies. Proc. Natl Acad. Sci. USA 111, E455–E464 (2013). This paper presents a framework for power calculation and ways to improve power for rare-variant studies.

    Article  CAS  Google Scholar 

  70. Adzhubei, I. A. et al. A method and server for predicting damaging missense mutations. Nature Methods 7, 248–249 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  71. Li, D., Lewinger, J. P., Gauderman, W. J., Murcray, C. E. & Conti, D. Using extreme phenotype sampling to identify the rare causal variants of quantitative traits in association studies. Genet. Epidemiol. 35, 790–799 (2011).

    Article  PubMed  PubMed Central  Google Scholar 

  72. Nejentsev, S., Walker, N., Riches, D., Egholm, M. & Todd, J. A. Rare variants of IFIH1, a gene implicated in antiviral responses, protect against type 1 diabetes. Science 324, 387–389 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  73. Bailey-Wilson, J. E. & Wilson, A. F. Linkage analysis in the next-generation sequencing era. Hum. Hered. 72, 228–236 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  74. Ionita-Laza, I., Lee, S., Makarov, V., Buxbaum, J. D. & Lin, X. Family-based association tests for sequence data, and comparisons with population-based association tests. Eur. J. Hum. Genet. 21, 1158–1162 (2013).

    Article  PubMed  PubMed Central  Google Scholar 

  75. Pinto, D. et al. Functional impact of global rare copy number variation in autism spectrum disorders. Nature 466, 368–372 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  76. Iossifov, I. et al. De novo gene disruptions in children on the autistic spectrum. Neuron 74, 285–299 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  77. Lim, Elaine, T. et al. Rare complete knockouts in humans: population distribution and significant role in autism spectrum disorders. Neuron 77, 235–242 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  78. Longmate, J. A., Larson, G. P., Krontiris, T. G. & Sommer, S. S. Three ways of combining genotyping and resequencing in case–control association studies. PLoS ONE 5, e14318 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  79. Aschard, H. et al. Combining effects from rare and common genetic variants in an exome-wide association study of sequence data. BMC Proc. 5, S44 (2011).

    Article  PubMed  PubMed Central  Google Scholar 

  80. He, X. et al. Integrated model of de novo and inherited genetic variants yields greater power to identify risk genes. PLoS Genet. 9, e1003671 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  81. Ye, K. Q. & Engelman, C. D. Detecting multiple causal rare variants in exome sequence data. Genet. Epidemiol. 35, S18–S21 (2011).

    Article  PubMed  PubMed Central  Google Scholar 

  82. Li, B., Wang, G. & Leal, S. M. SimRare: a program to generate and analyze sequence-based data for association studies of quantitative and qualitative traits. Bioinformatics 28, 2703–2704 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  83. Mathieson, I. & McVean, G. Differential confounding of rare and common variants in spatially structured populations. Nature Genet. 44, 243–246 (2012).

    Article  CAS  PubMed  Google Scholar 

  84. Lee, S., Teslovich, Tanya, M., Boehnke, M. & Lin, X. General framework for meta-analysis of rare variants in sequencing association studies. Am. J. Hum. Genet. 93, 42–53 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  85. Hu, Y.-J. et al. Meta-analysis of gene-level associations for rare variants based on single-variant statistics. Am. J. Hum. Genet. 93, 236–248 (2013). References 83 and 84 propose powerful and convenient score tests for meta-analyses of rare-variant association studies.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  86. Lee, S., Wu, M. C. & Lin, X. Optimal tests for rare variant effects in sequencing association studies. Biostatistics 13, 762–775 (2012). This paper describes the SKAT power calculation tool.

    Article  PubMed  PubMed Central  Google Scholar 

  87. Rees, E. et al. Analysis of copy number variations at 15 schizophrenia-associated loci. Br. J. Psychiatry 204, 108–114 (2013).

    Article  PubMed  Google Scholar 

  88. Patnaik, P. B. The power function of the test for the difference between two proportions in a 2 × 2 table. Biometrika 35, 157 (1948).

    CAS  PubMed  Google Scholar 

  89. Sidak, Z. Rectangular confidence regions for the means of multivariate normal distributions. J. Am. Statist. Associ. 62, 626 (1967).

    Google Scholar 

  90. Davison, A. C. & Hinkley, D. V. Bootstrap Methods and Their Application (Cambridge Univ. Press, 1997).

    Book  Google Scholar 

  91. Patnaik, P. B. The non-central χ2 - and F-distribution and their applications. Biometrika 36, 202 (1949).

    CAS  PubMed  Google Scholar 

  92. Whittaker, J. C. & Lewis, C. M. Power comparisons of the transmission/disequilibrium test and sib–transmission/disequilibrium-test statistics. Am. J. Hum. Genet. 65, 578–580 (1999).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  93. Fulker, D. W., Cherny, S. S., Sham, P. C. & Hewitt, J. K. Combined linkage and association sib-pair analysis for quantitative traits. Am. J. Hum. Genet. 64, 259–267 (1999).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  94. Kwan, J. S. H., Cherny, S. S., Kung, A. W. C. & Sham, P. C. Novel sib pair selection strategy increases power in quantitative association analysis. Behav. Genet. 39, 571–579 (2009).

    Article  PubMed  Google Scholar 

  95. Luan, J. Sample size determination for studies of gene–environment interaction. Int. J. Epidemiol. 30, 1035–1040 (2001).

    Article  CAS  PubMed  Google Scholar 

  96. Gauderman, W. J. Sample size requirements for association studies of gene–gene interaction. Am. J. Epidemiol. 155, 478–484 (2002).

    Article  PubMed  Google Scholar 

  97. Gauderman, W. J. Sample size requirements for matched case–control studies of gene–environment interaction. Statist. Med. 21, 35–50 (2002).

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported by The University of Hong Kong Strategic Research Theme on Genomics; Hong Kong Research Grants Council (HKRGC) General Research Funds 777511M, 776412M and 776513M; HKRGC Theme-Based Research Scheme T12-705/11 and T12-708/12-N; and the European Community Seventh Framework Programme Grant on European Network of National Schizophrenia Networks Studying Gene–Environment Interactions (EU-GEI); and the US National Institutes of Health grants R01 MH099126 and R01 HG005827 (to S.M.P.). The authors thank R. Porsch and S.-W. Choi for technical assistance with the manuscript.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Pak C. Sham.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Related links

PowerPoint slides

Glossary

Likelihoods

Probabilities (or probability densities) of observed data under an assumed statistical model as a function of model parameters.

Family-wise error rate

(FWER). The probability of at least one false-positive significant finding from a family of multiple tests when the null hypothesis is true for all the tests.

C-alpha test

A rare-variant association test based on the distribution of variants in cases and controls (that is, whether such a distribution has inflated variance compared with a binomial distribution).

Sequence kernel association test

(SKAT). A test based on score statistics for testing the association of rare variants from sequence data with either a continuous or a discontinuous genetic trait.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sham, P., Purcell, S. Statistical power and significance testing in large-scale genetic studies. Nat Rev Genet 15, 335–346 (2014). https://doi.org/10.1038/nrg3706

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/nrg3706

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing