Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014:2014:346074.
doi: 10.1155/2014/346074. Epub 2014 Jul 3.

MAVTgsa: an R package for gene set (enrichment) analysis

Affiliations

MAVTgsa: an R package for gene set (enrichment) analysis

Chih-Yi Chien et al. Biomed Res Int. 2014.

Abstract

Gene set analysis methods aim to determine whether an a priori defined set of genes shows statistically significant difference in expression on either categorical or continuous outcomes. Although many methods for gene set analysis have been proposed, a systematic analysis tool for identification of different types of gene set significance modules has not been developed previously. This work presents an R package, called MAVTgsa, which includes three different methods for integrated gene set enrichment analysis. (1) The one-sided OLS (ordinary least squares) test detects coordinated changes of genes in gene set in one direction, either up- or downregulation. (2) The two-sided MANOVA (multivariate analysis variance) detects changes both up- and downregulation for studying two or more experimental conditions. (3) A random forests-based procedure is to identify gene sets that can accurately predict samples from different experimental conditions or are associated with the continuous phenotypes. MAVTgsa computes the P values and FDR (false discovery rate) q-value for all gene sets in the study. Furthermore, MAVTgsa provides several visualization outputs to support and interpret the enrichment results. This package is available online.

PubMed Disclaimer

Figures

Figure 1
Figure 1
A schematic flowchart of GSEA using the MAVTgsa package.
Figure 2
Figure 2
The GSA-plot for OLS test and Hotelling's T 2 test in P53 study. Both P values in OLS test (black line) and T 2 test (red line) are not close to the diagonal dash line. That means both tests could identify that several gene sets showed truly significant of the testing hypotheses.
Figure 3
Figure 3
GST-plot for the gene set rasPathway in P53 dataset. The solid line is the empirical cumulative distribution function of the rank t-statistics for 10,100 genes in the array. The two-tailed shaded regions represent the t-statistics that had the P value less than 0.01. There are 22 tick marks above the plot which display the location of the P value of the genes from the gene set. The gene set shows underexpressed.
Figure 4
Figure 4
GST-plot for the gene set badPathway in P53 dataset. The solid line is the empirical cumulative distribution function of the rank t-statistics for 10,100 genes in the array. The two-tailed shaded regions represent the t-statistics that had the P value less than 0.01. There are 21 tick marks above the plot which display the location of the P value of the genes from the gene set. The gene set shows both under- and overexpressed.
Figure 5
Figure 5
GST-plot for the gene set cell_cycle_control in breast cancer dataset. The solid line is the empirical cumulative distribution function of the rank F-statistics for 1,113 genes in the array. The shaded regions represent the F-statistics that had the P value less than 0.01. There are 5 tick marks above the plot which display the location of the P value of the genes from the gene set.
Figure 6
Figure 6
Power comparisons of two GSA methods for a linear association between gene sets and continuous phenotypes: random forests and LCT.
Figure 7
Figure 7
Power comparisons of two GSA methods for a nonlinear association between gene sets and continuous phenotypes: random forests and LCT.

Similar articles

Cited by

References

    1. Drǎghici S, Khatri P, Martins RP, Ostermeier GC, Krawetz SA. Global functional profiling of gene expression. Genomics. 2003;81(2):98–104. - PubMed
    1. Khatri P, Drǎghici S. Ontological analysis of gene expression data: current tools, limitations, and open problems. Bioinformatics. 2005;21(18):3587–3595. - PMC - PubMed
    1. Rivals I, Personnaz L, Taing L, Potier MC. Enrichment or depletion of a GO category within a class of genes: which test? Bioinformatics. 2007;23(4):401–407. - PubMed
    1. Mootha VK, Lindgren CM, Eriksson K, et al. PGC-1α-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes. Nature Genetics. 2003;34(3):267–273. - PubMed
    1. Subramanian A, Tamayo P, Mootha VK, et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proceedings of the National Academy of Sciences of the United States of America. 2005;102(43):15545–15550. - PMC - PubMed

Publication types

MeSH terms

LinkOut - more resources