Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013 Jul 3;8(7):e68822.
doi: 10.1371/journal.pone.0068822. Print 2013.

GStream: improving SNP and CNV coverage on genome-wide association studies

Affiliations

GStream: improving SNP and CNV coverage on genome-wide association studies

Arnald Alonso et al. PLoS One. .

Abstract

We present GStream, a method that combines genome-wide SNP and CNV genotyping in the Illumina microarray platform with unprecedented accuracy. This new method outperforms previous well-established SNP genotyping software. More importantly, the CNV calling algorithm of GStream dramatically improves the results obtained by previous state-of-the-art methods and yields an accuracy that is close to that obtained by purely CNV-oriented technologies like Comparative Genomic Hybridization (CGH). We demonstrate the superior performance of GStream using microarray data generated from HapMap samples. Using the reference CNV calls generated by the 1000 Genomes Project (1KGP) and well-known studies on whole genome CNV characterization based either on CGH or genotyping microarray technologies, we show that GStream can increase the number of reliably detected variants up to 25% compared to previously developed methods. Furthermore, the increased genome coverage provided by GStream allows the discovery of CNVs in close linkage disequilibrium with SNPs, previously associated with disease risk in published Genome-Wide Association Studies (GWAS). These results could provide important insights into the biological mechanism underlying the detected disease risk association. With GStream, large-scale GWAS will not only benefit from the combined genotyping of SNPs and CNVs at an unprecedented accuracy, but will also take advantage of the computational efficiency of the method.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. GStream method for SNP genotyping.
This figure shows how GStream genotyping method works on two example markers, the first one representing a typical marker capturing a SNP (A and B) and the second one capturing both a SNP and a CNV (C and D). The leftmost graphs show the effects of the normalization procedure for the two markers, where the dotted blue lines enclose the ranges where candidate homozygotes and heterozygotes are identified in order to compute the scaling factors for each channel (black points over the axes). The rightmost graphs give an overview of the genotyping procedure: Upper subfigures represent the scaled BAF probability density function with the solid vertical lines setting the identified genotype centres, the dotted vertical lines setting the genotype limits and the horizontal lines representing the sequential search of genotype cluster peaks. Medium and lower subfigures represent genotype calls and quality call scores respectively.
Figure 2
Figure 2. GStream method for CNV genotyping.
(A) Each CNV analysis is divided in four independent sets where the number of allele copies per channel intensity is estimated. Here, the homozygote intensities over its respective informative channels (upper rightmost and leftmost graphs) are fitted with a two-component model (in this case, capturing a deletion) while heterozygote intensities over each channel are better fitted with a one-component model (upper centre graphs). Lower graphs show the intensity distributions (solid black lines) together with the corresponding copy number score (red points) assigned to each sample. AA homozygotes are mostly classified as deletions (scores near to 1), BB homozygotes are divided into diploids (scores∼2) and deletions (scores∼1) while heterozygotes are classified as diploids (i.e. one allele detected at each channel). (B) Final representation of the analyzed probe where points represent samples and colour their relative copy number scores. SNP and CNV genotypes are assigned along the BAF and the intensity axis respectively.
Figure 3
Figure 3. Evaluating SNP genotyping performance.
Plots comparing SNP genotyping algorithms for each microarray platform are tested. The vertical axis represents the percentage of SNPs that are excluded from the accuracy calculation by the lowest quality score criteria. GStream performed better at all the drop rate levels in all the platforms. A high decrease in performance is observed for GenCall when drop rate values are lower than its uncall rate (i.e. ∼2% in Human610Quad).
Figure 4
Figure 4. 1KGP structural variants captured by GStream.
(A) Percentage of 1KGP structural variants that are captured by GStream within different ranges of r2 between the 1KGP calls and the GStream calls over the best marker within the respective structural variant loci. (B) Distribution of the r2 values when more than one marker is found within the structural variant loci. Structural variants are stratified according to the best r2 obtained by all the markers covering the loci. (C) r2 distribution stratified by the frequency of the structural variation.
Figure 5
Figure 5. Evaluation of the power to capture genome-wide CNP association.
Plots comparing Chi-square test P-values obtained with the golden standard calls (i.e. McCarroll, Campbell and Conrad datasets) with those obtained with the four tested methods using HumanOmni1-Quad (A) and Human1M-Duo (B) platforms. Comparison is performed by observing the distribution of the P-value association ratios (i.e. tested method versus golden standard). A high performance difference was obtained between the two platforms tested (i.e. due to their high difference in coverage density) and between GStream and the rest of algorithms tested.

Similar articles

Cited by

References

    1. Hirschhorn JN, Gajdos ZK (2011) Genome-wide association studies: results from the first few years and potential implications for clinical medicine. Annual Review of Medicine 62: 11–24. - PubMed
    1. Julià A, Ballina J, Cañete JD, Balsa A, Tornero-Molina J, et al. (2008) Genome-wide association study of rheumatoid arthritis in the Spanish population: KLF12 as a risk locus for rheumatoid arthritis susceptibility. Arthritis & Rheumatism 58: 2275–2286. - PubMed
    1. Manolio TA, Brooks LD, Collins FS (2008) A HapMap harvest of insights into the genetics of common disease. The Journal of Clinical Investigation 118: 1590–1605. - PMC - PubMed
    1. Wellcome Trust Case Control Consortium (2007) Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 447: 661–678. - PMC - PubMed
    1. Ragoussis J (2009) Genotyping Technologies for Genetic Research. Annual Review of Genomics and Human Genetics 10: 117–133. - PubMed

Publication types

MeSH terms

Grants and funding

This work was supported by the Spanish Ministry of Economy and Competitiveness Strategic Project grants [PSE-010000-2006-6, IPT-010000-2010-36]. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

LinkOut - more resources