Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2012 Mar-Apr;19(2):289-94.
doi: 10.1136/amiajnl-2011-000652.

A comparison of cataloged variation between International HapMap Consortium and 1000 Genomes Project data

Affiliations
Comparative Study

A comparison of cataloged variation between International HapMap Consortium and 1000 Genomes Project data

Carrie C Buchanan et al. J Am Med Inform Assoc. 2012 Mar-Apr.

Abstract

Background: Since publication of the human genome in 2003, geneticists have been interested in risk variant associations to resolve the etiology of traits and complex diseases. The International HapMap Consortium undertook an effort to catalog all common variation across the genome (variants with a minor allele frequency (MAF) of at least 5% in one or more ethnic groups). HapMap along with advances in genotyping technology led to genome-wide association studies which have identified common variants associated with many traits and diseases. In 2008 the 1000 Genomes Project aimed to sequence 2500 individuals and identify rare variants and 99% of variants with a MAF of <1%.

Methods: To determine whether the 1000 Genomes Project includes all the variants in HapMap, we examined the overlap between single nucleotide polymorphisms (SNPs) genotyped in the two resources using merged phase II/III HapMap data and low coverage pilot data from 1000 Genomes.

Results: Comparison of the two data sets showed that approximately 72% of HapMap SNPs were also found in 1000 Genomes Project pilot data. After filtering out HapMap variants with a MAF of <5% (separately for each population), 99% of HapMap SNPs were found in 1000 Genomes data.

Conclusions: Not all variants cataloged in HapMap are also cataloged in 1000 Genomes. This could affect decisions about which resource to use for SNP queries, rare variant validation, or imputation. Both the HapMap and 1000 Genomes Project databases are useful resources for human genetics, but it is important to understand the assumptions made and filtering strategies employed by these projects.

PubMed Disclaimer

Conflict of interest statement

Competing interests: None.

Figures

Figure 1
Figure 1
Variants in HapMap and 1000 Genomes Project data. The left box shows an enhanced screenshot from the NCBI browser. rs2072413 shows that variants in HapMap are not always found in 1000 Genomes Project data. For reference, the validation status descriptions are shown in the box on the right. SNP, single nucleotide polymorphism.
Figure 2
Figure 2
Reference allele frequency in the HapMap exclusive data, by chromosome in CEU (A) and YRI (B).
Figure 3
Figure 3
Distribution of reference allele frequencies (most often the major allele frequency) of HapMap (HM) exclusive variants after filtering out any fixed alleles in CEU (A) and YRI (B) populations.
Figure 4
Figure 4
Total number of HapMap variants before and after filtering using CEU samples on chromosome 1. The y-axis shows the total number of variants (by hundred thousand). The tan bars indicate the number of HapMap variants left after an allele frequency filter is applied (if applied). The green bars indicate how many of those variants are present in 1000 Genomes Project pilot data. The numbers above each bar indicate the bar height, that is, the number of variants. For reference, the light gray line demonstrates the total number of variants on chromosome 1 in 1000 Genomes Project pilot data (approximately 605 000).

Similar articles

Cited by

References

    1. Donis-Keller H, Green P, Helms C, et al. A genetic linkage map of the human genome. Cell 1987;51:319–37 - PubMed
    1. Weissenbach J, Gyapay G, Dib C, et al. A second-generation linkage map of the human genome. Nature 1992;359:794–801 - PubMed
    1. Manolio TA, Collins FS. The HapMap and genome-wide association studies in diagnosis and therapy. Annu Rev Med 2009;60:443–56 - PMC - PubMed
    1. Altshuler D, Brooks LD, Chakravarti A, et al. A haplotype map of the human genome. Nature 2005;437:1299–320 - PMC - PubMed
    1. Patterson K. 1000 genomes: a world of variation. Circ Res 2011;108:534–6 - PubMed

Publication types