Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013 May;132(5):509-22.
doi: 10.1007/s00439-013-1266-7. Epub 2013 Jan 22.

Imputation across genotyping arrays for genome-wide association studies: assessment of bias and a correction strategy

Affiliations

Imputation across genotyping arrays for genome-wide association studies: assessment of bias and a correction strategy

Eric O Johnson et al. Hum Genet. 2013 May.

Abstract

A great promise of publicly sharing genome-wide association data is the potential to create composite sets of controls. However, studies often use different genotyping arrays, and imputation to a common set of SNPs has shown substantial bias: a problem which has no broadly applicable solution. Based on the idea that using differing genotyped SNP sets as inputs creates differential imputation errors and thus bias in the composite set of controls, we examined the degree to which each of the following occurs: (1) imputation based on the union of genotyped SNPs (i.e., SNPs available on one or more arrays) results in bias, as evidenced by spurious associations (type 1 error) between imputed genotypes and arbitrarily assigned case/control status; (2) imputation based on the intersection of genotyped SNPs (i.e., SNPs available on all arrays) does not evidence such bias; and (3) imputation quality varies by the size of the intersection of genotyped SNP sets. Imputations were conducted in European Americans and African Americans with reference to HapMap phase II and III data. Imputation based on the union of genotyped SNPs across the Illumina 1M and 550v3 arrays showed spurious associations for 0.2 % of SNPs: ~2,000 false positives per million SNPs imputed. Biases remained problematic for very similar arrays (550v1 vs. 550v3) and were substantial for dissimilar arrays (Illumina 1M vs. Affymetrix 6.0). In all instances, imputing based on the intersection of genotyped SNPs (as few as 30 % of the total SNPs genotyped) eliminated such bias while still achieving good imputation quality.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
Genomic inflation factors (grey lines) (λgc) and percentages of SNPs having spurious association (black lines) (P < 1 × 10−6), by minor allele frequency (MAF), when combining studies genotyped on different Illumina BeadChip arrays (Human1M or HumanHap550 version 3). ac European American subjects from SAGE were compared to PanScan subjects, and d-f African American subjects from SAGE were compared to iControl subjects. Three different SNP sets were assessed: a, d genotyped SNPs available on both arrays; b, e imputed SNPs based on the union of genotyped SNPs available on either array; and c, f imputed SNPs based on the intersection of genotyped SNPs available on both arrays. The number of SNPs with MAF >1 % and the overall λgc are shown in each plot
Fig. 2
Fig. 2
Genomic inflation factors (grey lines) (λgc) and percentages of SNPs having spurious association (black lines) (P < 1 × 10−6), by minor allele frequency (MAF), when combining studies genotyped on either the Illumina Human1M or Affymetrix 6.0 array. ac European American and df African American subjects from SAGE (genotyped on Illumina 1M) were compared to subjects from the GAIN GWAS of Schizophrenia (genotyped on Affymetrix 6.0). Three different SNP sets were assessed: a, d genotyped SNPs available on both arrays; b, e imputed SNPs based on the union of genotyped SNPs available on either array; and c, f imputed SNPs based on the intersection of genotyped SNPs available on both arrays. The number of SNPs with MAF >1 % and the overall λgc are shown in each plot
Fig. 3
Fig. 3
Average R2 values in SAGE control subjects (genotyped on Illumina’s Human1M) to indicate overall quality across all imputed SNPs, when imputation was based on all genotyped SNPs or the intersection of genotyped SNPs with Affymetrix 6.0 or varying Illumina arrays (Human1M, HumanOmni1-Quad, Human660W, HumanHap550 version 1, and HumanHap300-Duo version 2 BeadChip). Results are shown across minor allele frequency (MAF) intervals of 1 % for all imputed SNPs with MAF >1 % on chromosome 22: a ~34,000 SNPs in European Americans and b ~43,000 SNPs in African Americans
Fig. 4
Fig. 4
Expected statistical power by level of imputation accuracy (average R2) for differing numbers of public controls added to the baseline design of 2,000 cases and 2,000 controls (blue diamond and blue dashed line). Power was estimated for detection of a SNP effect size of 1 % explained variance in the phenotype. The baseline model provided 81 % power to detect this effect size at a genome-wide significance of P = 5 × 10−8
Fig. 5
Fig. 5
Expected statistical power by imputation accuracy (average R2) for the baseline study design (2,000 cases and 2,000 controls: blue diamond and blue dashed line) and several alternatives focusing study recruitment and genotyping on increasing numbers of cases and relying on public controls under the constraint of maximal recruitment and genotyping of 4,000 individuals. The baseline model provided 81 % power to detect this effect size at a genome-wide significance of P = 5 × 10−8

Similar articles

Cited by

References

    1. Almeida MA, Oliveira PS, Pereira TV, Krieger JE, Pereira AC. An empirical evaluation of imputation accuracy for association statistics reveals increased type-I error rates in genome-wide associations. BMC Genet. 2011;12:10. doi: 10.1186/1471-2156-12-10. - PMC - PubMed
    1. Altshuler DM, Gibbs RA, Peltonen L, Dermitzakis E, Schaffner SF, Yu F, Bonnen PE, de Bakker PI, Deloukas P, Gabriel SB, Gwilliam R, Hunt S, Inouye M, Jia X, Palotie A, Parkin M, Whittaker P, Chang K, Hawes A, Lewis LR, Ren Y, Wheeler D, Muzny DM, Barnes C, Darvishi K, Hurles M, Korn JM, Kristiansson K, Lee C, McCarrol SA, Nemesh J, Keinan A, Montgomery SB, Pollack S, Price AL, Soranzo N, Gonzaga-Jauregui C, Anttila V, Brodeur W, Daly MJ, Leslie S, McVean G, Moutsianas L, Nguyen H, Zhang Q, Ghori MJ, McGinnis R, McLaren W, Takeuchi F, Grossman SR, Shlyakhter I, Hostetter EB, Sabeti PC, Adebamowo CA, Foster MW, Gordon DR, Licinio J, Manca MC, Marshall PA, Matsuda I, Ngare D, Wang VO, Reddy D, Rotimi CN, Royal CD, Sharp RR, Zeng C, Brooks LD, McEwen JE. Integrating common and rare genetic variation in diverse human populations. Nature. 2010;467(7311):52–58. doi: 10.1038/nature09298. - PMC - PubMed
    1. Amundadottir L, Kraft P, Stolzenberg-Solomon RZ, Fuchs CS, Petersen GM, Arslan AA, Bueno-de-Mesquita HB, Gross M, Helzlsouer K, Jacobs EJ, LaCroix A, Zheng W, Albanes D, Bamlet W, Berg CD, Berrino F, Bingham S, Buring JE, Bracci PM, Canzian F, Clavel-Chapelon F, Clipp S, Cotterchio M, de Andrade M, Duell EJ, Fox JW, Jr, Gallinger S, Gaziano JM, Giovannucci EL, Goggins M, Gonzalez CA, Hallmans G, Hankinson SE, Hassan M, Holly EA, Hunter DJ, Hutchinson A, Jackson R, Jacobs KB, Jenab M, Kaaks R, Klein AP, Kooperberg C, Kurtz RC, Li D, Lynch SM, Mandelson M, McWilliams RR, Mendelsohn JB, Michaud DS, Olson SH, Overvad K, Patel AV, Peeters PH, Rajkovic A, Riboli E, Risch HA, Shu XO, Thomas G, Tobias GS, Trichopoulos D, Van Den Eeden SK, Virtamo J, Wactawski-Wende J, Wolpin BM, Yu H, Yu K, Zeleniuch-Jacquotte A, Chanock SJ, Hartge P, Hoover RN. Genome-wide association study identifies variants in the ABO locus associated with susceptibility to pancreatic cancer. Nat Genet. 2009;41(9):986–990. doi: 10.1038/ng.429. - PMC - PubMed
    1. Beecham GW, Martin ER, Gilbert JR, Haines JL, Pericak-Vance MA. APOE is not associated with Alzheimer disease: a cautionary tale of genotype imputation. Ann Hum Genet. 2010;74(3):189–194. doi: 10.1111/j.1469-1809.2010.00573.x. - PMC - PubMed
    1. Bierut LJ, Agrawal A, Bucholz KK, Doheny KF, Laurie C, Pugh E, Fisher S, Fox L, Howells W, Bertelsen S, Hinrichs AL, Almasy L, Breslau N, Culverhouse RC, Dick DM, Edenberg HJ, Foroud T, Grucza RA, Hatsukami D, Hesselbrock V, Johnson EO, Kramer J, Krueger RF, Kuperman S, Lynskey M, Mann K, Neuman RJ, Nothen MM, Nurnberger JI, Jr, Porjesz B, Ridinger M, Saccone NL, Saccone SF, Schuckit MA, Tischfield JA, Wang JC, Rietschel M, Goate AM, Rice JP. A genome-wide association study of alcohol dependence. Proc Natl Acad Sci USA. 2010;107(11):5082–5087. doi: 10.1073/pnas.0911109107. - PMC - PubMed

Publication types

Grants and funding