Imputation across genotyping arrays for genome-wide association studies: assessment of bias and a correction strategy
- PMID: 23334152
- PMCID: PMC3628082
- DOI: 10.1007/s00439-013-1266-7
Imputation across genotyping arrays for genome-wide association studies: assessment of bias and a correction strategy
Abstract
A great promise of publicly sharing genome-wide association data is the potential to create composite sets of controls. However, studies often use different genotyping arrays, and imputation to a common set of SNPs has shown substantial bias: a problem which has no broadly applicable solution. Based on the idea that using differing genotyped SNP sets as inputs creates differential imputation errors and thus bias in the composite set of controls, we examined the degree to which each of the following occurs: (1) imputation based on the union of genotyped SNPs (i.e., SNPs available on one or more arrays) results in bias, as evidenced by spurious associations (type 1 error) between imputed genotypes and arbitrarily assigned case/control status; (2) imputation based on the intersection of genotyped SNPs (i.e., SNPs available on all arrays) does not evidence such bias; and (3) imputation quality varies by the size of the intersection of genotyped SNP sets. Imputations were conducted in European Americans and African Americans with reference to HapMap phase II and III data. Imputation based on the union of genotyped SNPs across the Illumina 1M and 550v3 arrays showed spurious associations for 0.2 % of SNPs: ~2,000 false positives per million SNPs imputed. Biases remained problematic for very similar arrays (550v1 vs. 550v3) and were substantial for dissimilar arrays (Illumina 1M vs. Affymetrix 6.0). In all instances, imputing based on the intersection of genotyped SNPs (as few as 30 % of the total SNPs genotyped) eliminated such bias while still achieving good imputation quality.
Figures
Similar articles
-
Accuracy of genome-wide imputation of untyped markers and impacts on statistical power for association studies.BMC Genet. 2009 Jun 16;10:27. doi: 10.1186/1471-2156-10-27. BMC Genet. 2009. PMID: 19531258 Free PMC article.
-
Genotype imputation for African Americans using data from HapMap phase II versus 1000 genomes projects.Genet Epidemiol. 2012 Jul;36(5):508-16. doi: 10.1002/gepi.21647. Epub 2012 May 29. Genet Epidemiol. 2012. PMID: 22644746 Free PMC article.
-
Comprehensive evaluation of imputation performance in African Americans.J Hum Genet. 2012 Jul;57(7):411-21. doi: 10.1038/jhg.2012.43. Epub 2012 May 31. J Hum Genet. 2012. PMID: 22648186 Free PMC article.
-
Genotype Imputation in Genome-Wide Association Studies.Curr Protoc Hum Genet. 2019 Jun;102(1):e84. doi: 10.1002/cphg.84. Curr Protoc Hum Genet. 2019. PMID: 31216114 Review.
-
Two-stage strategy using denoising autoencoders for robust reference-free genotype imputation with missing input genotypes.J Hum Genet. 2024 Oct;69(10):511-518. doi: 10.1038/s10038-024-01261-6. Epub 2024 Jun 25. J Hum Genet. 2024. PMID: 38918526 Free PMC article. Review.
Cited by
-
Improved analyses of GWAS summary statistics by reducing data heterogeneity and errors.Nat Commun. 2021 Dec 8;12(1):7117. doi: 10.1038/s41467-021-27438-7. Nat Commun. 2021. PMID: 34880243 Free PMC article.
-
A multiancestry study identifies novel genetic associations with CHRNA5 methylation in human brain and risk of nicotine dependence.Hum Mol Genet. 2015 Oct 15;24(20):5940-54. doi: 10.1093/hmg/ddv303. Epub 2015 Jul 28. Hum Mol Genet. 2015. PMID: 26220977 Free PMC article.
-
Accuracy of haplotype estimation and whole genome imputation affects complex trait analyses in complex biobanks.Commun Biol. 2023 Jan 26;6(1):101. doi: 10.1038/s42003-023-04477-y. Commun Biol. 2023. PMID: 36697501 Free PMC article.
-
Genome-wide association studies: assessing trait characteristics in model and crop plants.Cell Mol Life Sci. 2021 Aug;78(15):5743-5754. doi: 10.1007/s00018-021-03868-w. Epub 2021 Jul 1. Cell Mol Life Sci. 2021. PMID: 34196733 Free PMC article. Review.
-
Inclusion of genetic variants in an ensemble of gradient boosting decision trees does not improve the prediction of citalopram treatment response.Sci Rep. 2021 Feb 12;11(1):3780. doi: 10.1038/s41598-021-83338-2. Sci Rep. 2021. PMID: 33580158 Free PMC article.
References
-
- Altshuler DM, Gibbs RA, Peltonen L, Dermitzakis E, Schaffner SF, Yu F, Bonnen PE, de Bakker PI, Deloukas P, Gabriel SB, Gwilliam R, Hunt S, Inouye M, Jia X, Palotie A, Parkin M, Whittaker P, Chang K, Hawes A, Lewis LR, Ren Y, Wheeler D, Muzny DM, Barnes C, Darvishi K, Hurles M, Korn JM, Kristiansson K, Lee C, McCarrol SA, Nemesh J, Keinan A, Montgomery SB, Pollack S, Price AL, Soranzo N, Gonzaga-Jauregui C, Anttila V, Brodeur W, Daly MJ, Leslie S, McVean G, Moutsianas L, Nguyen H, Zhang Q, Ghori MJ, McGinnis R, McLaren W, Takeuchi F, Grossman SR, Shlyakhter I, Hostetter EB, Sabeti PC, Adebamowo CA, Foster MW, Gordon DR, Licinio J, Manca MC, Marshall PA, Matsuda I, Ngare D, Wang VO, Reddy D, Rotimi CN, Royal CD, Sharp RR, Zeng C, Brooks LD, McEwen JE. Integrating common and rare genetic variation in diverse human populations. Nature. 2010;467(7311):52–58. doi: 10.1038/nature09298. - PMC - PubMed
-
- Amundadottir L, Kraft P, Stolzenberg-Solomon RZ, Fuchs CS, Petersen GM, Arslan AA, Bueno-de-Mesquita HB, Gross M, Helzlsouer K, Jacobs EJ, LaCroix A, Zheng W, Albanes D, Bamlet W, Berg CD, Berrino F, Bingham S, Buring JE, Bracci PM, Canzian F, Clavel-Chapelon F, Clipp S, Cotterchio M, de Andrade M, Duell EJ, Fox JW, Jr, Gallinger S, Gaziano JM, Giovannucci EL, Goggins M, Gonzalez CA, Hallmans G, Hankinson SE, Hassan M, Holly EA, Hunter DJ, Hutchinson A, Jackson R, Jacobs KB, Jenab M, Kaaks R, Klein AP, Kooperberg C, Kurtz RC, Li D, Lynch SM, Mandelson M, McWilliams RR, Mendelsohn JB, Michaud DS, Olson SH, Overvad K, Patel AV, Peeters PH, Rajkovic A, Riboli E, Risch HA, Shu XO, Thomas G, Tobias GS, Trichopoulos D, Van Den Eeden SK, Virtamo J, Wactawski-Wende J, Wolpin BM, Yu H, Yu K, Zeleniuch-Jacquotte A, Chanock SJ, Hartge P, Hoover RN. Genome-wide association study identifies variants in the ABO locus associated with susceptibility to pancreatic cancer. Nat Genet. 2009;41(9):986–990. doi: 10.1038/ng.429. - PMC - PubMed
-
- Bierut LJ, Agrawal A, Bucholz KK, Doheny KF, Laurie C, Pugh E, Fisher S, Fox L, Howells W, Bertelsen S, Hinrichs AL, Almasy L, Breslau N, Culverhouse RC, Dick DM, Edenberg HJ, Foroud T, Grucza RA, Hatsukami D, Hesselbrock V, Johnson EO, Kramer J, Krueger RF, Kuperman S, Lynskey M, Mann K, Neuman RJ, Nothen MM, Nurnberger JI, Jr, Porjesz B, Ridinger M, Saccone NL, Saccone SF, Schuckit MA, Tischfield JA, Wang JC, Rietschel M, Goate AM, Rice JP. A genome-wide association study of alcohol dependence. Proc Natl Acad Sci USA. 2010;107(11):5082–5087. doi: 10.1073/pnas.0911109107. - PMC - PubMed
Publication types
MeSH terms
Grants and funding
- U01 HG004438/HG/NHGRI NIH HHS/United States
- P01 CA087969/CA/NCI NIH HHS/United States
- P01 CA089392/CA/NCI NIH HHS/United States
- R33 DA027486/DA/NIDA NIH HHS/United States
- U01 HG004422/HG/NHGRI NIH HHS/United States
- R01 MH059587/MH/NIMH NIH HHS/United States
- HHSN268200782096C/HL/NHLBI NIH HHS/United States
- R01 MH059566/MH/NIMH NIH HHS/United States
- U01 CA049449/CA/NCI NIH HHS/United States
- R01 MH060879/MH/NIMH NIH HHS/United States
- R01 MH061675/MH/NIMH NIH HHS/United States
- R01 CA050385/CA/NCI NIH HHS/United States
- R01DA026141/DA/NIDA NIH HHS/United States
- U01 HG004446/HG/NHGRI NIH HHS/United States
- R01 CA067262/CA/NCI NIH HHS/United States
- U01 MH046276/MH/NIMH NIH HHS/United States
- R01 DA025888/DA/NIDA NIH HHS/United States
- U01 MH079469/MH/NIMH NIH HHS/United States
- U01 CA067262/CA/NCI NIH HHS/United States
- R01 CA065725/CA/NCI NIH HHS/United States
- R01 MH067257/MH/NIMH NIH HHS/United States
- R01DA025888/DA/NIDA NIH HHS/United States
- R01 MH060870/MH/NIMH NIH HHS/United States
- R01 MH081800/MH/NIMH NIH HHS/United States
- R01 MH059571/MH/NIMH NIH HHS/United States
- R01 MH059565/MH/NIMH NIH HHS/United States
- R01 DA026141/DA/NIDA NIH HHS/United States
- U01 MH079470/MH/NIMH NIH HHS/United States
- R01 DA013423/DA/NIDA NIH HHS/United States
- U01 CA098233/CA/NCI NIH HHS/United States
- R01 MH059586/MH/NIMH NIH HHS/United States
- R01 CA049449/CA/NCI NIH HHS/United States
- U10 AA008401/AA/NIAAA NIH HHS/United States
- R01 MH059588/MH/NIMH NIH HHS/United States
- R33DA027486/DA/NIDA NIH HHS/United States
- U01 MH046318/MH/NIMH NIH HHS/United States
LinkOut - more resources
Full Text Sources
Other Literature Sources
Medical