Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 Sep 2;467(7311):52-8.
doi: 10.1038/nature09298.

Integrating common and rare genetic variation in diverse human populations

International HapMap 3 Consortium  1 David M AltshulerRichard A GibbsLeena PeltonenDavid M AltshulerRichard A GibbsLeena PeltonenEmmanouil DermitzakisStephen F SchaffnerFuli YuLeena PeltonenEmmanouil DermitzakisPenelope E BonnenDavid M AltshulerRichard A GibbsPaul I W de BakkerPanos DeloukasStacey B GabrielRhian GwilliamSarah HuntMichael InouyeXiaoming JiaAarno PalotieMelissa ParkinPamela WhittakerFuli YuKyle ChangAlicia HawesLora R LewisYanru RenDavid WheelerRichard A GibbsDonna Marie MuznyChris BarnesKatayoon DarvishiMatthew HurlesJoshua M KornKati KristianssonCharles LeeSteven A McCarrolJames NemeshEmmanouil DermitzakisAlon KeinanStephen B MontgomerySamuela PollackAlkes L PriceNicole SoranzoPenelope E BonnenRichard A GibbsClaudia Gonzaga-JaureguiAlon KeinanAlkes L PriceFuli YuVerneri AnttilaWendy BrodeurMark J DalyStephen LeslieGil McVeanLoukas MoutsianasHuy NguyenStephen F SchaffnerQingrun ZhangMohammed J R GhoriRalph McGinnisWilliam McLarenSamuela PollackAlkes L PriceStephen F SchaffnerFumihiko TakeuchiSharon R GrossmanIlya ShlyakhterElizabeth B HostetterPardis C SabetiClement A AdebamowoMorris W FosterDeborah R GordonJulio LicinioMaria Cristina MancaPatricia A MarshallIchiro MatsudaDuncan NgareVivian Ota WangDeepa ReddyCharles N RotimiCharmaine D RoyalRichard R SharpChangqing ZengLisa D BrooksJean E McEwen
Affiliations

Integrating common and rare genetic variation in diverse human populations

International HapMap 3 Consortium et al. Nature. .

Abstract

Despite great progress in identifying genetic variants that influence human disease, most inherited risk remains unexplained. A more complete understanding requires genome-wide studies that fully examine less common alleles in populations with a wide range of ancestry. To inform the design and interpretation of such studies, we genotyped 1.6 million common single nucleotide polymorphisms (SNPs) in 1,184 reference individuals from 11 global populations, and sequenced ten 100-kilobase regions in 692 of these individuals. This integrated data set of common and rare alleles, called 'HapMap 3', includes both SNPs and copy number polymorphisms (CNPs). We characterized population-specific differences among low-frequency variants, measured the improvement in imputation accuracy afforded by the larger reference panel, especially in imputing SNPs with a minor allele frequency of <or=5%, and demonstrated the feasibility of imputing newly discovered CNPs and SNPs. This expanded public resource of genome variants in global populations supports deeper interrogation of genomic variation and its role in human disease, and serves as a step towards a high-resolution map of the landscape of human genetic variation.

PubMed Disclaimer

Figures

Figure 1
Figure 1. Size and frequency spectra of common and rare CNPs
a, Estimated size distribution of common CNPs calculated from the physical span of the genomic probes supporting each CNP event. b, Allele frequency spectrum for biallelic CNPs calculated from integer CNP genotypes for the samples analysed in this work.
Figure 2
Figure 2. SNP discovery informativeness across populations
a, b, For each of 7 populations for which at least 60 individuals were resequenced, we considered a sample of 30 individuals, another non-overlapping sample of 30 individuals from the same population, and a sample of 30 individuals from each of the 6 other populations (results are averaged over 1,000 random samplings). Out of all SNPs that are either polymorphic (a) or polymorphic with a minor allele with at most two copies in the sample of 30 individuals (b), here we present the fraction that are also polymorphic in a different sample, starting with the other sample from the same population (black bars). The black bars serve as a baseline that accounts for the effect of sampling stochasticity and sequencing errors on SNP discovery. The different y-axis scales used reflect the lower likelihood of a low-frequency variant being seen in a different sample.
Figure 3
Figure 3. Effect of sample size on SNP ascertainment
The number of SNPs discovered as a function of sample size by averaging over 1,000 random samplings. For each population, we randomly sampled without replacement a subset of the individuals of any possible size and considered which SNPs were polymorphic in the resequencing data for that sample. For any given sample size, many more variants are discovered in populations with genetic proximity to Africa (LWK, ASW and YRI), compared to populations of non-African ancestry.
Figure 4
Figure 4. Haplotype sharing around SNPs and CNPs
a, b, Extent of haplotype homozygosity around variant alleles of various frequencies. Shown are SNPs from the ENCODE sequence, CNPs of comparable frequency, SNPs from the arrays and on randomly grouped chromosomes, and (for YRI) the maximum possible sharing for a genotyping error rate of 0.2%. a, CEU. b, YRI.
Figure 4
Figure 4. Haplotype sharing around SNPs and CNPs
a, b, Extent of haplotype homozygosity around variant alleles of various frequencies. Shown are SNPs from the ENCODE sequence, CNPs of comparable frequency, SNPs from the arrays and on randomly grouped chromosomes, and (for YRI) the maximum possible sharing for a genotyping error rate of 0.2%. a, CEU. b, YRI.
Figure 5
Figure 5. Imputation accuracy and reference panel size
a, b, Mean r2 between true and imputed genotype dosage for SNPs imputed from a HapMap-II-sized panel of 120 CEU chromosomes (HMII-CEU) or a HapMap 3 panel of 410 European-ancestry chromosomes (CEU+TSI). Scatter plots show Affymetrix 500K SNPs on chromosome 20 imputed for 1,393 subjects of the 1958 British birth cohort. a, Rare SNPs (MAF <0.5%). b, Low-frequency SNPs (MAF = 0.5–5%).
Figure 6
Figure 6. Imputation: new populations, new variants
a, b, Mean r2 between true and imputed genotype dosage as a function of copies of minor allele in the reference panel. a, The loss in imputation accuracy when the reference population differs slightly from the target population (CEU imputed into CEU compared to CEU into TSI; and YRI into YRI compared to YRI into LWK). b, Imputation accuracy for newly discovered variants (CNPs and ENCODE SNPs).
Figure 6
Figure 6. Imputation: new populations, new variants
a, b, Mean r2 between true and imputed genotype dosage as a function of copies of minor allele in the reference panel. a, The loss in imputation accuracy when the reference population differs slightly from the target population (CEU imputed into CEU compared to CEU into TSI; and YRI into YRI compared to YRI into LWK). b, Imputation accuracy for newly discovered variants (CNPs and ENCODE SNPs).

Comment in

  • Expanding HapMap.
    Rusk N. Rusk N. Nat Methods. 2010 Oct;7(10):780-1. doi: 10.1038/nmeth1010-780b. Nat Methods. 2010. PMID: 20936772 No abstract available.

Similar articles

Cited by

References

    1. International Human Genome Sequencing Consortium Initial sequencing and analysis of the human genome. Nature. 2001;409:860–921. - PubMed
    1. The Internation SNP Map Working Group A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms. Nature. 2001;409:928–933. - PubMed
    1. The International HapMap Consortium A second generation human haplotype map of over 3.1 million SNPs. Nature. 2007;449:851–861. - PMC - PubMed
    1. Donnelly P. Progress and challenges in genome-wide association studies in humans. Nature. 2008;456:728–731. - PubMed
    1. Manolio TA, et al. Finding the missing heritability of complex diseases. Nature. 2009;461:747–753. - PMC - PubMed

Publication types