The 1000 Genomes Project: data management and community access

doi:10.1038/nmeth.1974

. 2012 Apr 27;9(5):459-62.

doi: 10.1038/nmeth.1974.

The 1000 Genomes Project: data management and community access

Laura Clarke¹, Xiangqun Zheng-Bradley, Richard Smith, Eugene Kulesha, Chunlin Xiao, Iliana Toneva, Brendan Vaughan, Don Preuss, Rasko Leinonen, Martin Shumway, Stephen Sherry, Paul Flicek; 1000 Genomes Project Consortium

Collaborators, Affiliations

PMID: 22543379
PMCID: PMC3340611
DOI: 10.1038/nmeth.1974

The 1000 Genomes Project: data management and community access

Laura Clarke et al. Nat Methods. 2012.

. 2012 Apr 27;9(5):459-62.

doi: 10.1038/nmeth.1974.

PMID: 22543379
PMCID: PMC3340611
DOI: 10.1038/nmeth.1974

Abstract

The 1000 Genomes Project was launched as one of the largest distributed data collection and analysis projects ever undertaken in biology. In addition to the primary scientific goals of creating both a deep catalog of human genetic variation and extensive methods to accurately discover and characterize variation using new sequencing technologies, the project makes all of its data publicly available. Members of the project data coordination center have developed and deployed several tools to enable widespread data access.

PubMed Disclaimer

Figures

**Figure 1**
Data Flow in the 1000 Genomes Project. The sequencing centers submit their raw data to one of the two SRA databases (arrow 1), which exchange data. The DCC retrieves FASTQ files from the SRA (arrow 2) and performs QC steps on the data. The analysis group access data from the DCC (arrow 3), aligns the sequence data to the genome and uses the alignments to call variants. Both the alignment files and variant files are provided back to the DCC (arrow 4). All the data is publically released as soon as possible. Sequencing center names are provided in supplementary table 1.

**Figure 2**
Remote File Viewing. The 1000 Genomes Browser enables the attachment of remote files to allow accessible BAM and VCF files to be displayed within Location view. The tracks in the image from our October 2011 browser based on Ensembl version 63 are (A) NA12878 BAM file from EBI FTP site with consensus sequence noted by the upper arrow and sequence reads by the lower arrow. (B) Variants from 20110521 release VCF file show as a track with two variants in yellow (C) Variants from the 20101123 release database shown as a track with one variant in yellow (D) Gene annotation from Ensembl showing the genomic context. The ability for users to view data from files allows rapid access to new data before the database can be updated.

See this image and copyright information in PMC

Cited by

Unique genomic and neoepitope landscapes across tumors: a study across time, tissues, and space within a single lynch syndrome patient.
Phung TN, Lenkiewicz E, Malasi S, Sharma A, Anderson KS, Wilson MA, Pockaj BA, Barrett MT. Phung TN, et al. Sci Rep. 2020 Jul 22;10(1):12190. doi: 10.1038/s41598-020-68939-7. Sci Rep. 2020. PMID: 32699259 Free PMC article.
Causal associations and potential mechanisms between inflammatory skin diseases and IgA nephropathy: a bi-directional Mendelian randomization study.
Cao W, Xiong J. Cao W, et al. Front Genet. 2024 Jul 25;15:1402302. doi: 10.3389/fgene.2024.1402302. eCollection 2024. Front Genet. 2024. PMID: 39119579 Free PMC article.
Origin, distribution, and function of three frequent coding polymorphisms in the gene for the human P2X7 ion channel.
Schäfer W, Stähler T, Pinto Espinoza C, Danquah W, Knop JH, Rissiek B, Haag F, Koch-Nolte F. Schäfer W, et al. Front Pharmacol. 2022 Nov 18;13:1033135. doi: 10.3389/fphar.2022.1033135. eCollection 2022. Front Pharmacol. 2022. PMID: 36467077 Free PMC article.
Causal Relationships between Air Pollutant Exposure and Bone Mineral Density and the Risk of Bone Fractures: Evidence from a Two-Stage Mendelian Randomization Analysis.
Hu X, Zhao Y, He T, Gao ZX, Zhang P, Fang Y, Ge M, Xu YQ, Pan HF, Wang P. Hu X, et al. Toxics. 2023 Dec 30;12(1):27. doi: 10.3390/toxics12010027. Toxics. 2023. PMID: 38250984 Free PMC article.
Enhancing Discovery of Genetic Variants for Posttraumatic Stress Disorder Through Integration of Quantitative Phenotypes and Trauma Exposure Information.
Maihofer AX, Choi KW, Coleman JRI, Daskalakis NP, Denckla CA, Ketema E, Morey RA, Polimanti R, Ratanatharathorn A, Torres K, Wingo AP, Zai CC, Aiello AE, Almli LM, Amstadter AB, Andersen SB, Andreassen OA, Arbisi PA, Ashley-Koch AE, Austin SB, Avdibegović E, Borglum AD, Babić D, Bækvad-Hansen M, Baker DG, Beckham JC, Bierut LJ, Bisson JI, Boks MP, Bolger EA, Bradley B, Brashear M, Breen G, Bryant RA, Bustamante AC, Bybjerg-Grauholm J, Calabrese JR, Caldas-de-Almeida JM, Chen CY, Dale AM, Dalvie S, Deckert J, Delahanty DL, Dennis MF, Disner SG, Domschke K, Duncan LE, Džubur Kulenović A, Erbes CR, Evans A, Farrer LA, Feeny NC, Flory JD, Forbes D, Franz CE, Galea S, Garrett ME, Gautam A, Gelaye B, Gelernter J, Geuze E, Gillespie CF, Goçi A, Gordon SD, Guffanti G, Hammamieh R, Hauser MA, Heath AC, Hemmings SMJ, Hougaard DM, Jakovljević M, Jett M, Johnson EO, Jones I, Jovanovic T, Qin XJ, Karstoft KI, Kaufman ML, Kessler RC, Khan A, Kimbrel NA, King AP, Koen N, Kranzler HR, Kremen WS, Lawford BR, Lebois LAM, Lewis C, Liberzon I, Linnstaedt SD, Logue MW, Lori A, Lugonja B, Luykx JJ, Lyons MJ, Maples-Keller JL, Marmar C, Martin NG, Maurer D, Mavissakalian MR, McFarlane A, McGlinchey RE, … See abstract for full author list ➔ Maihofer AX, et al. Biol Psychiatry. 2022 Apr 1;91(7):626-636. doi: 10.1016/j.biopsych.2021.09.020. Epub 2021 Sep 28. Biol Psychiatry. 2022. PMID: 34865855 Free PMC article.

See all "Cited by" articles

References

1. 1000 Genomes Project Consortium. Durbin RM, Abecasis GR, Altshuler DL, et al. A map of human genome variation from population-scale sequencing. Nature. 2010;467:1061–1073. - PMC - PubMed
1. Thorisson GA, Smith AV, Krishnan L, Stein LD. The International HapMap Project Web site. Genome Res. 2005;15:1592–1593. - PMC - PubMed
1. Rosenbloom KR, Dreszer TR, Pheasant M, Barber GP, et al. ENCODE whole-genome data in the UCSC Genome Browser. Nucleic Acids Res. 2010;38:D620–D625. - PMC - PubMed
1. Washington NL, Stinson EO, Perry MD, Ruzanov P, et al. The modENCODE Data Coordination Center: lessons in harvesting comprehensive experimental details. Database (Oxford) 2011;2011:bar023. - PMC - PubMed
1. Baker M. Next-generation sequencing: adjusting to data overload. Nature Methods. 2010;7:495–499.

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Molecular Biology Databases
- NIAID Data Ecosystem - Find datasets on Infectious and Immune-mediated Diseases

[1] 1000 Genomes Project Consortium. Durbin RM, Abecasis GR, Altshuler DL, et al. A map of human genome variation from population-scale sequencing. Nature. 2010;467:1061–1073. - PMC - PubMed

[2] 1000 Genomes Project Consortium. Durbin RM, Abecasis GR, Altshuler DL, et al. A map of human genome variation from population-scale sequencing. Nature. 2010;467:1061–1073. - PMC - PubMed

[3] Thorisson GA, Smith AV, Krishnan L, Stein LD. The International HapMap Project Web site. Genome Res. 2005;15:1592–1593. - PMC - PubMed

[4] Thorisson GA, Smith AV, Krishnan L, Stein LD. The International HapMap Project Web site. Genome Res. 2005;15:1592–1593. - PMC - PubMed

[5] Rosenbloom KR, Dreszer TR, Pheasant M, Barber GP, et al. ENCODE whole-genome data in the UCSC Genome Browser. Nucleic Acids Res. 2010;38:D620–D625. - PMC - PubMed

[6] Rosenbloom KR, Dreszer TR, Pheasant M, Barber GP, et al. ENCODE whole-genome data in the UCSC Genome Browser. Nucleic Acids Res. 2010;38:D620–D625. - PMC - PubMed

[7] Washington NL, Stinson EO, Perry MD, Ruzanov P, et al. The modENCODE Data Coordination Center: lessons in harvesting comprehensive experimental details. Database (Oxford) 2011;2011:bar023. - PMC - PubMed

[8] Washington NL, Stinson EO, Perry MD, Ruzanov P, et al. The modENCODE Data Coordination Center: lessons in harvesting comprehensive experimental details. Database (Oxford) 2011;2011:bar023. - PMC - PubMed

[9] Baker M. Next-generation sequencing: adjusting to data overload. Nature Methods. 2010;7:495–499.

[10] Baker M. Next-generation sequencing: adjusting to data overload. Nature Methods. 2010;7:495–499.

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

The 1000 Genomes Project: data management and community access

The 1000 Genomes Project: data management and community access

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Molecular Biology Databases

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Related information

Grants and funding

LinkOut - more resources

Full Text Sources

Molecular Biology Databases