Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012 Jan;8(1):e1002482.
doi: 10.1371/journal.pgen.1002482. Epub 2012 Jan 26.

A flexible Bayesian model for studying gene-environment interaction

Affiliations

A flexible Bayesian model for studying gene-environment interaction

Kai Yu et al. PLoS Genet. 2012 Jan.

Abstract

An important follow-up step after genetic markers are found to be associated with a disease outcome is a more detailed analysis investigating how the implicated gene or chromosomal region and an established environment risk factor interact to influence the disease risk. The standard approach to this study of gene-environment interaction considers one genetic marker at a time and therefore could misrepresent and underestimate the genetic contribution to the joint effect when one or more functional loci, some of which might not be genotyped, exist in the region and interact with the environment risk factor in a complex way. We develop a more global approach based on a Bayesian model that uses a latent genetic profile variable to capture all of the genetic variation in the entire targeted region and allows the environment effect to vary across different genetic profile categories. We also propose a resampling-based test derived from the developed Bayesian model for the detection of gene-environment interaction. Using data collected in the Environment and Genetics in Lung Cancer Etiology (EAGLE) study, we apply the Bayesian model to evaluate the joint effect of smoking intensity and genetic variants in the 15q25.1 region, which contains a cluster of nicotinic acetylcholine receptor genes and has been shown to be associated with both lung cancer and smoking behavior. We find evidence for gene-environment interaction (P-value = 0.016), with the smoking effect appearing to be stronger in subjects with a genetic profile associated with a higher lung cancer risk; the conventional test of gene-environment interaction based on the single-marker approach is far from significant.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. The partition of the genotype space in the simulation study.
We conducted a principal component analysis on all subjects from the EAGLE study with genotypes at the 15 chosen tagging SNPs as coordinates. We plot subjects by their first and second principal components. Subjects with the same multilocus genotype were represented by a single point in the plot. The points in green, blue, and red colors are those subjects (genotypes) belonging to region I (consisting of genotypes having no more than 1 risk allele among the three considered functional SNPs), region II (consisting of genotypes having 2 risk alleles), and region III (consisting of genotypes have more than 2 risk alleles).
Figure 2
Figure 2. Boxplots of the posterior medians of the intercept () for subjects within each true cluster from each of 50 datasets simulated under the model .
(a). Boxplots of posterior medians of formula image for subjects in cluster 1, with the true value given by the horizontal line in green; (b). Boxplots of posterior medians of formula image for subjects in cluster 2, with the true value given by the horizontal line in blue. The posterior median of formula image for each subject under a given simulated dataset was shifted by a constant value selected so that the median value of the shifted estimates for subjects in cluster 1 was zero.
Figure 3
Figure 3. Boxplots of the posterior medians of the log odds ratio () for subjects within each true cluster from each of 50 datasets simulated under the model .
(a). Boxplots of posterior medians of formula image for subjects in cluster 1, with the true value given by the horizontal line in green; (b). Boxplots of posterior medians of formula image for subjects in cluster 2, with the true value given by the horizontal line in blue.
Figure 4
Figure 4. Boxplots of the posterior medians of the intercept () for subjects within each true cluster from each of 50 datasets simulated under the model .
(a). Boxplots of posterior medians of formula image for subjects in cluster 1, with the true value given by the horizontal line in green; (b). Boxplots of posterior medians of formula image for subjects in cluster 2, with the true value given by the horizontal line in blue; (c). Boxplots of posterior medians of formula image for subjects in cluster 3, with the true value given by the horizontal line in red. The posterior median of formula image for each subject under a given simulated dataset was shifted by a constant value selected so that the median value of the shifted estimates for subjects in cluster 1 was zero.
Figure 5
Figure 5. Boxplots of the posterior medians of the log odds ratio () for subjects within each true cluster from each of 50 datasets simulated under the model .
(a). Boxplots of posterior medians of formula image for subjects in cluster 1, with the true value given by the horizontal line in green; (b). Boxplots of posterior medians of formula image for subjects in cluster 2, with the true value given by the horizontal line in blue; (c). Boxplots of posterior medians of formula image for subjects in cluster 3, with the true value given by the horizontal line in red.
Figure 6
Figure 6. Cluster assignment for the EAGLE study.
The cluster assignment estimated under the model with the number of clusters K = 2. Every subject was represented by his or her first 2 principal components. Subjects with the same multilocus genotype were represented by one point in the plot.
Figure 7
Figure 7. DIC plots for the Bayesian risk model allowing for gene–environment interaction.
For any given number of clusters, 20 DIC values were obtained by applying the proposed method to the data from the EAGLE study 20 times with different random seeds.
Figure 8
Figure 8. Smoothed surface plots of the posterior medians of the odds ratios for the genetic and smoking effects on the space of the first two principal components.
(a). Posterior median of the OR for the genetic effect under the model with the number of clusters K = 2; (b). Posterior mean of the OR for the smoking effect under the model with the number of clusters K = 2.

Similar articles

Cited by

References

    1. Hindorff LA, Junkins HA, Hall PN, Mehta JP, Manolio TA. A catalog of published genome-wide association studies. 2011. Available at: www.genome.gov/gwastudies. Accessed August, 2011.
    1. Lindstrom S, Schumacher F, Siddiq A, Travis RC, Campa D, et al. Characterizing associations and SNP-environment interactions for GWAS-identified prostate cancer risk markers-Results from BPC3. PLoS ONE. 2011;6:e17142. doi: 10.1371/journal.pone.0017142. - DOI - PMC - PubMed
    1. Rothman N, Garcia-Closas M, Chatterjee N, Malats N, Wu X, et al. A multi-stage genome-wide association study of bladder cancer identifies multiple susceptibility loci. Nat Genet. 2010;42:978–984. - PMC - PubMed
    1. Spitz MR, Amos CI, Dong Q, Lin J, Wu X. The CHRNA5-A3 region on chromosome 15q24–25.1 is a risk factor both for nicotine dependence and for lung cancer. J Natl Cancer Inst. 2008;100:1552–1556. - PMC - PubMed
    1. Moore JH, Asselbergs FW, Williams SM. Bioinformatics challenges for genome-wide association studies. Bioinformatics. 2010;26:445–455. - PMC - PubMed

Publication types