Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012 Apr 27;9(5):459-62.
doi: 10.1038/nmeth.1974.

The 1000 Genomes Project: data management and community access

Collaborators, Affiliations

The 1000 Genomes Project: data management and community access

Laura Clarke et al. Nat Methods. .

Abstract

The 1000 Genomes Project was launched as one of the largest distributed data collection and analysis projects ever undertaken in biology. In addition to the primary scientific goals of creating both a deep catalog of human genetic variation and extensive methods to accurately discover and characterize variation using new sequencing technologies, the project makes all of its data publicly available. Members of the project data coordination center have developed and deployed several tools to enable widespread data access.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Data Flow in the 1000 Genomes Project. The sequencing centers submit their raw data to one of the two SRA databases (arrow 1), which exchange data. The DCC retrieves FASTQ files from the SRA (arrow 2) and performs QC steps on the data. The analysis group access data from the DCC (arrow 3), aligns the sequence data to the genome and uses the alignments to call variants. Both the alignment files and variant files are provided back to the DCC (arrow 4). All the data is publically released as soon as possible. Sequencing center names are provided in supplementary table 1.
Figure 2
Figure 2
Remote File Viewing. The 1000 Genomes Browser enables the attachment of remote files to allow accessible BAM and VCF files to be displayed within Location view. The tracks in the image from our October 2011 browser based on Ensembl version 63 are (A) NA12878 BAM file from EBI FTP site with consensus sequence noted by the upper arrow and sequence reads by the lower arrow. (B) Variants from 20110521 release VCF file show as a track with two variants in yellow (C) Variants from the 20101123 release database shown as a track with one variant in yellow (D) Gene annotation from Ensembl showing the genomic context. The ability for users to view data from files allows rapid access to new data before the database can be updated.

Similar articles

Cited by

References

    1. 1000 Genomes Project Consortium. Durbin RM, Abecasis GR, Altshuler DL, et al. A map of human genome variation from population-scale sequencing. Nature. 2010;467:1061–1073. - PMC - PubMed
    1. Thorisson GA, Smith AV, Krishnan L, Stein LD. The International HapMap Project Web site. Genome Res. 2005;15:1592–1593. - PMC - PubMed
    1. Rosenbloom KR, Dreszer TR, Pheasant M, Barber GP, et al. ENCODE whole-genome data in the UCSC Genome Browser. Nucleic Acids Res. 2010;38:D620–D625. - PMC - PubMed
    1. Washington NL, Stinson EO, Perry MD, Ruzanov P, et al. The modENCODE Data Coordination Center: lessons in harvesting comprehensive experimental details. Database (Oxford) 2011;2011:bar023. - PMC - PubMed
    1. Baker M. Next-generation sequencing: adjusting to data overload. Nature Methods. 2010;7:495–499.

Publication types