Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2006 Dec;16(12):1575-84.
doi: 10.1101/gr.5629106. Epub 2006 Nov 22.

Genome-wide detection of human copy number variations using high-density DNA oligonucleotide arrays

Affiliations
Comparative Study

Genome-wide detection of human copy number variations using high-density DNA oligonucleotide arrays

Daisuke Komura et al. Genome Res. 2006 Dec.

Abstract

Recent reports indicate that copy number variations (CNVs) within the human genome contribute to nucleotide diversity to a larger extent than single nucleotide polymorphisms (SNPs). In addition, the contribution of CNVs to human disease susceptibility may be greater than previously expected, although a complete understanding of the phenotypic consequences of CNVs is incomplete. We have recently reported a comprehensive view of CNVs among 270 HapMap samples using high-density SNP genotyping arrays and BAC array CGH. In this report, we describe a novel algorithm using Affymetrix GeneChip Human Mapping 500K Early Access (500K EA) arrays that identified 1203 CNVs ranging in size from 960 bp to 3.4 Mb. The algorithm consists of three steps: (1) Intensity pre-processing to improve the resolution between pairwise comparisons by directly estimating the allele-specific affinity as well as to reduce signal noise by incorporating probe and target sequence characteristics via an improved version of the Genomic Imbalance Map (GIM) algorithm; (2) CNV extraction using an adapted SW-ARRAY procedure to automatically and robustly detect candidate CNV regions; and (3) copy number inference in which all pairwise comparisons are summarized to more precisely define CNV boundaries and accurately estimate CNV copy number. Independent testing of a subset of CNVs by quantitative PCR and mass spectrometry demonstrated a >90% verification rate. The use of high-resolution oligonucleotide arrays relative to other methods may allow more precise boundary information to be extracted, thereby enabling a more accurate analysis of the relationship between CNVs and other genomic features.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Flowchart overview of the algorithm. Red, blue, and yellow boxes indicate that the process was carried out for each array, each sample pair, and each CNV region, respectively. GIM is used for intensity pre-processing, SW-ARRAY is used for pairwise CNV detection, and the maximum clique algorithm is used for CNV extraction.
Figure 2.
Figure 2.
Overview of copy number inference. (A) Pairwise comparisons of five different DNA samples (a–e) in a given candidate CNV region. The x-axis represents the SNP positions, and the blue lines are log2 signal intensity ratios for any given pair. The red line indicates the significant CNVs detected by SW-ARRAY. (B) Summary of the comparisons of any given sample to the remaining four samples. Based on the physical location of copy number changes, the frequencies are calculated for each sample, and consecutive CNV regions are extracted. Each row represents a single sample, and each column represents the frequency of a given SNP. The frequency of a particular SNP is the number of times that it is called a CNV in all four pairwise comparisons. (C) Graph theory (the maximum clique algorithm) is applied to the frequency summarization results presented in B. In this example, samples c, d, and e, which have the lowest frequency and represent the maximum clique, are defined as the diploid group. (D) Density (the proportion of comparisons where a CNV is called) is calculated based on the diploid samples found by the maximum clique algorithm, and the boundary of the CNV region in each nondiploid sample is determined. (E) Copy number is determined based on the median ratio of each CNV region.
Figure 3.
Figure 3.
Parameter tuning. Several parameters were optimized for SW-ARRAY and CNV extraction including (A) intensity ratio threshold, (B) statistical significance, (C) number of SNPs and restriction fragments required for calling a CNV, and (D) density cutoff (the fraction of positive pairwise comparisons necessary for calling a CNV). For each parameter, CNVs were called from pairwise comparisons between NA15510 and NA10851 (A,B,C) or population-wide comparisons (D) for each sample. In A and B, the percentage of CNVs called more than half of the time (red line) is compared to the percentage that have been positively validated (blue line), or negatively validated, false positives (green line in B with the y-axis on the right-hand side). The final values chosen were 1.12 for the intensity ratio threshold, and 0.01 for the P-value (indicated by the vertical black lines). In C, the size distribution of CNVs detected using the 3 SNPs:2 fragments criterion (with a mean length of 300 kb) is compared to the 4 SNPs:3 fragments criterion (mean length 326 kb). In D, the number above each bar is the percentage of validated CNVs and validated diploid regions within certain density bins. A 10% density cutoff was chosen for further analyses. For any given cutoff, a false positive is defined as the percentage of all validated diploid regions that are incorrectly called as a CNV with a density that is greater than the cutoff; false negative is defined as the percentage of validated CNVs that are missed because their density is lower than the cutoff.
Figure 4.
Figure 4.
CNV boundary determination. In A,B,C, the x-axis is the sequential order of the SNPs both within and outside the CNV region; the y-axis represents individual samples. In D, the x-axis represents the intensity ratio, and the y-axis is the sample frequency. (A) Diploid density distribution for HapMap CNVID 1166 at chr22:23932716–24371067. In the left-most column, CNV samples detected by the algorithm are shown in red, the diploid samples selected by the maximum clique algorithm are shown in black, and samples that display the intensity trend but do not meet the CNV extraction criteria are shown in white. (B) Median ratio distribution in the same region as shown in A. The median ratios were calculated based on the diploid samples with the same genotype. The similar pattern with A indicates that the CNV regions were successfully detected by the algorithm. (C) The diploid density (blue solid line) and median ratio (green dotted line) smoothed with a 10 probe window of sample 1 (top graph) and sample 2 (bottom graph) indicated by the purple arrows in A and B. The 10% and 90% boundaries are shown as dashed red lines. (D) The intensity ratio histogram of all 270 samples in the same CNV region depicted in A, B, and C shows clear clusters that correspond to one, two, three, and four copies of the region. The histogram is compressed in the middle range of the y-axis as represented by the wavy double line.
Figure 5.
Figure 5.
Mendelian and non-Mendelian CNV inheritance. (A) Transmission of a single copy gain transmitted from a YRI mother (NA18870), to the child (NA18872), and absent in the father (NA18871). These results were also confirmed by qPCR. (B) A single copy loss identified in a CEU child (NA10831) that is not present in either parent (NA12156 or NA12155). Each plot shows the smoothed (50-kb window) signal ratio intensity on the y-axis and the physical position of the probes on the x-axis.

Similar articles

Cited by

References

    1. Bansal A., van den Boom D., Kammerer S., Honisch C., Adam G., Cantor C.R., Kleyn P., Braun A., van den Boom D., Kammerer S., Honisch C., Adam G., Cantor C.R., Kleyn P., Braun A., Kammerer S., Honisch C., Adam G., Cantor C.R., Kleyn P., Braun A., Honisch C., Adam G., Cantor C.R., Kleyn P., Braun A., Adam G., Cantor C.R., Kleyn P., Braun A., Cantor C.R., Kleyn P., Braun A., Kleyn P., Braun A., Braun A. Association testing by DNA pooling: An effective initial screen. Proc. Natl. Acad. Sci. 2002;99:16871–16874. - PMC - PubMed
    1. Beroukhim R., Lin M., Park Y., Hao K., Zhao X., Garraway L.A., Fox E.A., Hochberg E.P., Mellinghoff I.K., Hofer M.D., Lin M., Park Y., Hao K., Zhao X., Garraway L.A., Fox E.A., Hochberg E.P., Mellinghoff I.K., Hofer M.D., Park Y., Hao K., Zhao X., Garraway L.A., Fox E.A., Hochberg E.P., Mellinghoff I.K., Hofer M.D., Hao K., Zhao X., Garraway L.A., Fox E.A., Hochberg E.P., Mellinghoff I.K., Hofer M.D., Zhao X., Garraway L.A., Fox E.A., Hochberg E.P., Mellinghoff I.K., Hofer M.D., Garraway L.A., Fox E.A., Hochberg E.P., Mellinghoff I.K., Hofer M.D., Fox E.A., Hochberg E.P., Mellinghoff I.K., Hofer M.D., Hochberg E.P., Mellinghoff I.K., Hofer M.D., Mellinghoff I.K., Hofer M.D., Hofer M.D., et al. Inferring loss-of-heterozygosity from unpaired tumors using high-density oligonucleotide SNP arrays. PLoS Comput. Biol. 2006;2:e41. - PMC - PubMed
    1. Bignell G.R., Huang J., Greshock J., Watt S., Butler A., West S., Grigorova M., Jones K.W., Wei W., Stratton M.R., Huang J., Greshock J., Watt S., Butler A., West S., Grigorova M., Jones K.W., Wei W., Stratton M.R., Greshock J., Watt S., Butler A., West S., Grigorova M., Jones K.W., Wei W., Stratton M.R., Watt S., Butler A., West S., Grigorova M., Jones K.W., Wei W., Stratton M.R., Butler A., West S., Grigorova M., Jones K.W., Wei W., Stratton M.R., West S., Grigorova M., Jones K.W., Wei W., Stratton M.R., Grigorova M., Jones K.W., Wei W., Stratton M.R., Jones K.W., Wei W., Stratton M.R., Wei W., Stratton M.R., Stratton M.R., et al. High-resolution analysis of DNA copy number using oligonucleotide microarrays. Genome Res. 2004;14:287–295. - PMC - PubMed
    1. Conrad D.F., Andrews T.D., Carter N.P., Hurles M.E., Pritchard J.K., Andrews T.D., Carter N.P., Hurles M.E., Pritchard J.K., Carter N.P., Hurles M.E., Pritchard J.K., Hurles M.E., Pritchard J.K., Pritchard J.K. A high-resolution survey of deletion polymorphism in the human genome. Nat. Genet. 2006;38:75–81. - PubMed
    1. Di X., Matsuzaki H., Webster T.A., Hubbell E., Liu G., Dong S., Bartell D., Huang J., Chiles R., Yang G., Matsuzaki H., Webster T.A., Hubbell E., Liu G., Dong S., Bartell D., Huang J., Chiles R., Yang G., Webster T.A., Hubbell E., Liu G., Dong S., Bartell D., Huang J., Chiles R., Yang G., Hubbell E., Liu G., Dong S., Bartell D., Huang J., Chiles R., Yang G., Liu G., Dong S., Bartell D., Huang J., Chiles R., Yang G., Dong S., Bartell D., Huang J., Chiles R., Yang G., Bartell D., Huang J., Chiles R., Yang G., Huang J., Chiles R., Yang G., Chiles R., Yang G., Yang G., et al. Dynamic model based algorithms for screening and genotyping over 100K SNPs on oligonucleotide microarrays. Bioinformatics. 2005;21:1958–1963. - PubMed

Publication types

Associated data

LinkOut - more resources