Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2007 Jul;3(7):e114.
doi: 10.1371/journal.pgen.0030114. Epub 2007 May 30.

Imputation-based analysis of association studies: candidate regions and quantitative traits

Affiliations

Imputation-based analysis of association studies: candidate regions and quantitative traits

Bertrand Servin et al. PLoS Genet. 2007 Jul.

Abstract

We introduce a new framework for the analysis of association studies, designed to allow untyped variants to be more effectively and directly tested for association with a phenotype. The idea is to combine knowledge on patterns of correlation among SNPs (e.g., from the International HapMap project or resequencing data in a candidate region of interest) with genotype data at tag SNPs collected on a phenotyped study sample, to estimate ("impute") unmeasured genotypes, and then assess association between the phenotype and these estimated genotypes. Compared with standard single-SNP tests, this approach results in increased power to detect association, even in cases in which the causal variant is typed, with the greatest gain occurring when multiple causal variants are present. It also provides more interpretable explanations for observed associations, including assessing, for each SNP, the strength of the evidence that it (rather than another correlated SNP) is causal. Although we focus on association studies with quantitative phenotype and a relatively restricted region (e.g., a candidate gene), the framework is applicable and computationally practical for whole genome association studies. Methods described here are implemented in a software package, Bim-Bam, available from the Stephens Lab website http://stephenslab.uchicago.edu/software.html.

PubMed Disclaimer

Conflict of interest statement

Competing interests. The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. Power Comparisons
(A) single common variant, modest dominance; (B) single common variant, strong dominance for minor allele; (C) single rare variant, no dominance; (D) multiple common variants. Each colored line shows power of test varying with significance threshold (type I error). Black: BF from our method (prior D 2); Green: pmin (allelic test); Red: pmin (genotype test); Blue: preg, multiple regression; Grey: BFmax. Each column of figures shows results for data analyzed under the “resequencing design” (left) and the “tag SNP design” (right). Each row shows results for the four different simulation scenarios.
Figure 2
Figure 2. Comparison of Results for Resequencing Design (x-axis) and Tag SNP Design (y-axis)
Panels show: (a) errors in the estimates (posterior means) of the heterozygote effect (a + d); (b) errors in the estimates (posterior means) of the main effect (a); and (c) posterior probability of being a QTN (P((a, d) ≠ (0, 0))) assigned to the causal variant.
Figure 3
Figure 3. Examination of Potential Effect of Different Tag SNP Strategies on Power, When the Causal Variant is Rare (0.01 < MAF < 0.05)
Solid line: Resequencing design; dashed line: tag SNP design, with tags selected using method from [19]; and dotted line: tag SNP design, with all SNPs except the causal SNP as tags.
Figure 4
Figure 4. Power of the Multipoint Approach in the Rare Variant Scenario for Two Different Imputation Algorithms
Figure 5
Figure 5. Scatter Plot of Samples from Prior Distribution of a (x-axis) and a + d (y-axis), for Priors D 1 (Black) and D 2 (Blue)
The solid yellow line corresponds to d = 0 (additivity). The dashed red lines are the limits above and below which a SNP exhibits over-dominance.
Figure 6
Figure 6. Comparison of Inferences using Prior D 1 and D 2 for the BF (Left) and the Posterior Probability Assigned to the Causal Locus Being a QTN (Right)
Results shown are for all datasets for the common variant Scenario (A) and (B) and for both the resequencing design and the tag SNP design. The discrepancy between the larger estimated BFs is caused by the fact that we used insufficient MCMC iterations to accurately estimate very large BFs (>106) under prior D 1.
Figure 7
Figure 7. Illustration of How a Multi-QTN Model Can Provide Fuller Explanations Than a One-QTN Model for Observed Associations
The figure shows, for each SNP in a dataset simulated under Scenario (D), the estimated posterior probability that it is a QTN, conditional on an association being observed. Left: Results from one-QTN model. Right: Results from multi-QTN model allowing up to four QTNs. The four actual QTNs are indicated with a star. Colors of the vertical lines indicate tag SNP “bins” (i.e., groups of SNPs tagged by the same variant).
Figure 8
Figure 8. Results for the SCN1A Dataset
Left panel shows the posterior probability assigned to each SNP being a QTN, with filled triangles denoting tag SNPs and open circles denoting non-tag SNPs. The right panel shows (in gray) estimated posterior densities of the additive effect for each of the seven SNPs assigned the highest posterior probabilities of non-zero effect (representing 90% of the posterior mass). The average of these curves is shown in black.

Similar articles

Cited by

References

    1. The International HapMap Consortium. A haplotype map of the human genome. Nature. 2005;437:1299–1320. - PMC - PubMed
    1. SeattleSNPs. Seattle (Washington): NHLBI Program for Genomic Applications; Available: http://pga.gs.washington.edu. Accessed 12 June 2007.
    1. Kraft P, Pharoah P, Chanock SJ, Albanes D, Kolonel LN, et al. Genetic variation in the HSD17B1 gene and risk of prostate cancer. PLoS Genet. 2005;1:e68. doi: 10.1371/journal.pgen.0010068. - DOI - PMC - PubMed
    1. Li N, Stephens M. Modeling linkage disequilibrium and identifying recombination hotspots using single-nucleotide polymorphism data. Genetics. 2003;165:2213–2233. - PMC - PubMed
    1. Scheet P, Stephens M. A fast and flexible statistical model for large-scale population genotype data: Applications to inferring missing genotypes and haplotypic phase. Am J Hum Genet. 2006;78:629–644. - PMC - PubMed

Publication types