Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 Mar 1;30(5):698-705.
doi: 10.1093/bioinformatics/btt572. Epub 2013 Oct 21.

ATHENA: the analysis tool for heritable and environmental network associations

Affiliations

ATHENA: the analysis tool for heritable and environmental network associations

Emily R Holzinger et al. Bioinformatics. .

Abstract

Motivation: Advancements in high-throughput technology have allowed researchers to examine the genetic etiology of complex human traits in a robust fashion. Although genome-wide association studies have identified many novel variants associated with hundreds of traits, a large proportion of the estimated trait heritability remains unexplained. One hypothesis is that the commonly used statistical techniques and study designs are not robust to the complex etiology that may underlie these human traits. This etiology could include non-linear gene × gene or gene × environment interactions. Additionally, other levels of biological regulation may play a large role in trait variability.

Results: To address the need for computational tools that can explore enormous datasets to detect complex susceptibility models, we have developed a software package called the Analysis Tool for Heritable and Environmental Network Associations (ATHENA). ATHENA combines various variable filtering methods with machine learning techniques to analyze high-throughput categorical (i.e. single nucleotide polymorphisms) and quantitative (i.e. gene expression levels) predictor variables to generate multivariable models that predict either a categorical (i.e. disease status) or quantitative (i.e. cholesterol levels) outcomes. The goal of this article is to demonstrate the utility of ATHENA using simulated and biological datasets that consist of both single nucleotide polymorphisms and gene expression variables to identify complex prediction models. Importantly, this method is flexible and can be expanded to include other types of high-throughput data (i.e. RNA-seq data and biomarker measurements).

Availability: ATHENA is freely available for download. The software, user manual and tutorial can be downloaded from http://ritchielab.psu.edu/ritchielab/software.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
ATHENA filtering and modeling components
Fig. 2.
Fig. 2.
Schematic of the three meta-dimensional genetic effect models. From left to right: main effect model, SNP × SNP + EV (S × S + E), SNP × EV (S × E). SNP.IND had an indirect effect on the phenotype via its correlation with the EV
Fig. 3.
Fig. 3.
Results from the SNP-only simulation analyses. Description of the 14 genetic effect models is shown in the first three columns. Detection power is defined as the number of times out of 100 datasets the indicated variable(s) is identified. Avg. Fitness is defined as balanced accuracy. Avg. Model Size is defined as the average number of variables in the best model
Fig. 4.
Fig. 4.
Results from the meta-dimensional simulation analyses. Description of the 14 genetic effect models is shown in the first three columns. Detection power is defined as the number of times out of 100 datasets the indicated variable(s) is identified. Avg. Fitness is defined as R2. Avg. Model Size is defined as the average number of variables in the best model
Fig. 5.
Fig. 5.
Schematic of the filtering-modeling pipeline for the biological dataset analysis

Similar articles

Cited by

References

    1. Aulchenko YS, et al. GenABEL: an R library for genome-wide association analysis. Bioinformatics. 2007;23:1294–1296. - PubMed
    1. Bishop CM. Neural Networks for Pattern Recognition. London: Oxford University Press; 1995.
    1. Breiman L. Random Forests. Machine Learning. 2001;45:5–32.
    1. Bush WS, et al. Biofilter: a knowledge-integration system for the multi-locus analysis of genome-wide association studies. Pac. Symp. Biocomput. 2009:368–379. - PMC - PubMed
    1. Carniak E. Bayesian networks without tears. AI Magazine. 1991:50–63.

Publication types