Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 Oct 29;330(6004):641-6.
doi: 10.1126/science.1197005.

Diversity of human copy number variation and multicopy genes

Collaborators, Affiliations

Diversity of human copy number variation and multicopy genes

Peter H Sudmant et al. Science. .

Abstract

Copy number variants affect both disease and normal phenotypic variation, but those lying within heavily duplicated, highly identical sequence have been difficult to assay. By analyzing short-read mapping depth for 159 human genomes, we demonstrated accurate estimation of absolute copy number for duplications as small as 1.9 kilobase pairs, ranging from 0 to 48 copies. We identified 4.1 million "singly unique nucleotide" positions informative in distinguishing specific copies and used them to genotype the copy and content of specific paralogs within highly duplicated gene families. These data identify human-specific expansions in genes associated with brain development, reveal extensive population genetic diversity, and detect signatures consistent with gene conversion in the human species. Our approach makes ~1000 genes accessible to genetic studies of disease association.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
Landscape of human copy number variation. (A) CNV heatmap of a 734-kbp duplicated region flanking the 17q21.31 MAPT locus in 13 individuals (11 sequenced to high coverage). Read depth–based copy number (CN) estimations (3-kbp windows) are indicated by color (scale provided to the right). FISH at two separate loci validates these absolute CN predictions across five individuals (9). (B) Copy number landscape of the 17q21.31 locus across three different populations showing marked population stratification (159 genomes analyzed). A European-enriched duplication overlaps the gene KIAA1267 and is present on two haplotypes—along form (205 kbp) and a short form (155 kbp). A 210-kbp duplication of the NSF gene ranges from two to six copies with increased copy number in Asians. For validation with array CGH, see fig. S31. (C) Copy number frequency histograms of the KIAA1267 and NSF duplications based on median read depth predict discrete copies. Duplications of the KIAA1267 locus are specific to Europeans at a frequency of 72%. 25% of Asians have six copies of NSF.
Fig. 2
Fig. 2
Validation and application. (A) Single-channel array CGH data are highly correlated (r = 0.95) with read depth–based genotypes for the highly duplicated TBC1D3 gene (copy number range 5 to 53). Note the reduced copy number of this gene family among Europeans (color coding as in Fig. 1C). (B) Heatmap of a 340-kbp region proximal to the fascioscapulohumeral muscular dystrophy (FSHD) region on chromosome 4 identifies a polymorphic segmental duplication ranging from 5 to 8 copies. In the human reference genome (build 36) this segment is annotated as a single copy (i.e., unique), but all humans carry duplications mapping to chromosomes 4, 13, 14, and 21.
Fig. 3
Fig. 3
Human gene family copy number diversity and evolution. (A) The genes most stratified by copy number in the human genome on the basis of Vst analysis of European, African, and Asian populations. (B) Human-specific gene family expansions.
Fig. 4
Fig. 4
Paralog-specific copy number resolution and genotyping. (A) Schematic showing SUN identifiers among four high-identity duplications. SUNs (orange) uniquely distinguish one duplicated copy from all others, in contrast to paralogous sequence variants (PSVs, blue), which may be shared among copies. (B) Resolving duplication mirror effects with paralog-specific genotyping. Total read depth and array CGH fail to distinguish the origin of copy number variation between two high-identity (98.5%) segmental duplications mapping to chromosome 1p13.1 and 7q11.23. SUN read-depth mapping, however, predicts that copy number variation is restricted to 7q11.23 and not 1p13.1. FISH on these samples confirms copy number gains and losses on 7q11.23 (fig. S51).
Fig. 5
Fig. 5
Paralog-specific gene family copy number variation. (A) Paralog-specific copy number estimates of 990 duplicated genes show that most, on average, are diploid within the human species (median psCN = 2 ± 0.5), and nearly half show little variation in copy. Among 49.2% of duplicated genes, deviation from the median copy occurs rarely (≤5% of individuals). By contrast, genes outside of segmental duplications and other known regions of copy number variation are nearly devoid of common CNVs (blue), even when genotyping with randomly subsampled positions (gray) to mimic the restricted density of SUN markers within duplicated genes. (B) Population stratification and paralog-specific copy variability of a human expanded-gene family of unknown function, NBPF (neuroblastoma breakpoint gene family). Certain paralogs (e.g., NBPF1) are highly amplified, extremely variable, and stratified by population, whereas others are nearly fixed and diploid (e.g., NBPF7).

Comment in

Similar articles

Cited by

  • Genomic Multicopy Loci Targeted by Current Forensic Quantitative PCR Assays.
    Jäger R. Jäger R. Genes (Basel). 2024 Oct 5;15(10):1299. doi: 10.3390/genes15101299. Genes (Basel). 2024. PMID: 39457423 Free PMC article. Review.
  • Gene expansions contributing to human brain evolution.
    Soto DC, Uribe-Salazar JM, Kaya G, Valdarrago R, Sekar A, Haghani NK, Hino K, La GN, Mariano NAF, Ingamells C, Baraban AE, Turner TN, Green ED, Simó S, Quon G, Andrés AM, Dennis MY. Soto DC, et al. bioRxiv [Preprint]. 2024 Sep 26:2024.09.26.615256. doi: 10.1101/2024.09.26.615256. bioRxiv. 2024. PMID: 39386494 Free PMC article. Preprint.
  • Complex genetic variation in nearly complete human genomes.
    Logsdon GA, Ebert P, Audano PA, Loftus M, Porubsky D, Ebler J, Yilmaz F, Hallast P, Prodanov T, Yoo D, Paisie CA, Harvey WT, Zhao X, Martino GV, Henglin M, Munson KM, Rabbani K, Chin CS, Gu B, Ashraf H, Austine-Orimoloye O, Balachandran P, Bonder MJ, Cheng H, Chong Z, Crabtree J, Gerstein M, Guethlein LA, Hasenfeld P, Hickey G, Hoekzema K, Hunt SE, Jensen M, Jiang Y, Koren S, Kwon Y, Li C, Li H, Li J, Norman PJ, Oshima KK, Paten B, Phillippy AM, Pollock NR, Rausch T, Rautiainen M, Scholz S, Song Y, Söylev A, Sulovari A, Surapaneni L, Tsapalou V, Zhou W, Zhou Y, Zhu Q, Zody MC, Mills RE, Devine SE, Shi X, Talkowski ME, Chaisson MJP, Dilthey AT, Konkel MK, Korbel JO, Lee C, Beck CR, Eichler EE, Marschall T. Logsdon GA, et al. bioRxiv [Preprint]. 2024 Sep 25:2024.09.24.614721. doi: 10.1101/2024.09.24.614721. bioRxiv. 2024. PMID: 39372794 Free PMC article. Preprint.
  • CNVoyant a machine learning framework for accurate and explainable copy number variant classification.
    Schuetz RJ, Ceyhan D, Antoniou AA, Chaudhari BP, White P. Schuetz RJ, et al. Sci Rep. 2024 Sep 28;14(1):22411. doi: 10.1038/s41598-024-72470-4. Sci Rep. 2024. PMID: 39333267 Free PMC article.
  • Zebrafish models of human-duplicated SRGAP2 reveal novel functions in microglia and visual system development.
    Uribe-Salazar JM, Kaya G, Weyenberg K, Radke B, Hino K, Soto DC, Shiu JL, Zhang W, Ingamells C, Haghani NK, Xu E, Rosas J, Simó S, Miesfeld J, Glaser T, Baraban SC, Jao LE, Dennis MY. Uribe-Salazar JM, et al. bioRxiv [Preprint]. 2024 Sep 27:2024.09.11.612570. doi: 10.1101/2024.09.11.612570. bioRxiv. 2024. PMID: 39314374 Free PMC article. Preprint.

References

    1. Bailey JA, et al. Science. 2002;297:1003. - PubMed
    1. Conrad DF, et al. Nature. 2010;464:704. - PMC - PubMed
    1. Iafrate AJ, et al. Nat. Genet. 2004;36:949. - PubMed
    1. Kidd JM, et al. Nature. 2008;453:56. - PMC - PubMed
    1. Cooper GM, Zerr T, Kidd JM, Eichler EE, Nickerson DA. Nat. Genet. 2008;40:1199. - PMC - PubMed

Publication types

Associated data

LinkOut - more resources