Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Oct 30;24(1):408.
doi: 10.1186/s12859-023-05510-x.

Roastgsa: a comparison of rotation-based scores for gene set enrichment analysis

Affiliations

Roastgsa: a comparison of rotation-based scores for gene set enrichment analysis

Adrià Caballé-Mestres et al. BMC Bioinformatics. .

Abstract

Background: Gene-wise differential expression is usually the first major step in the statistical analysis of high-throughput data obtained from techniques such as microarrays or RNA-sequencing. The analysis at gene level is often complemented by interrogating the data in a broader biological context that considers as unit of measure groups of genes that may have a common function or biological trait. Among the vast number of publications about gene set analysis (GSA), the rotation test for gene set analysis, also referred to as roast, is a general sample randomization approach that maintains the integrity of the intra-gene set correlation structure in defining the null distribution of the test.

Results: We present roastgsa, an R package that contains several enrichment score functions that feed the roast algorithm for hypothesis testing. These implemented methods are evaluated using both simulated and benchmarking data in microarray and RNA-seq datasets. We find that computationally intensive measures based on Kolmogorov-Smirnov (KS) statistics fail to improve the rates of simpler measures of GSA like mean and maxmean scores. We also show the importance of accounting for the gene linear dependence structure of the testing set, which is linked to the loss of effective signature size. Complete graphical representation of the results, including an approximation for the effective signature size, can be obtained as part of the roastgsa output.

Conclusions: We encourage the usage of the absmean (non-directional), mean (directional) and maxmean (directional) scores for roast GSA analysis as these are simple measures of enrichment that have presented dominant results in all provided analyses in comparison to the more complex KS measures.

Keywords: Competitive test; Correlation; Gene set analysis; Rotation test.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1
Scope of rotational gene set analysis: from gene set of interest to statistical significance. The enrichment scores mean, maxmean, median and absmean are proposed for both self-contained and competitive approaches. The meanrank, ksmax and ksmean are exclusive scores for competitive testing. All test statistics are defined in Tables 1 and 2
Fig. 2
Fig. 2
Characteristics of all presented scores: performance in simulated data is measured from 1 (poor) to 10 (great) based on the obtained recovery rates (the average recovery rate relative to the best rate); performance in benchmarking data is measured from 1 to 10 based on the M1 ranking; computational time is measured relative to the fastest method; Scores that were implemented in limma are specified for both romer (competitive scores) and roast (self-contained scores) functions
Fig. 3
Fig. 3
Roastgsa output figures: a the ordered moderated t-statistics in various formats: area under the curve for all genes ordered by moderated-t statistic, barcode plot for these ordered values and density; b classic GSEA plot c effective signature size p-value curve that determines the number of randomly selected genes needed to obtain levels of variability in the rotation GSA scores as extreme as the rotation GSA scores variance in the testing gene set; d normalized expression values and gene set statistics to represent the variation across samples for the gene set of interest

Similar articles

Cited by

References

    1. Goeman JJ, Buhlmann P. Analyzing gene expression data in terms of gene sets: methodological issues. Bioinformatics. 2007;23(8):980–987. doi: 10.1093/bioinformatics/btm051. - DOI - PubMed
    1. Lim E, Wu D, Smyth GK, Asselin-Labat M-L, Vaillant F, Visvader JE. ROAST: rotation gene set tests for complex microarray experiments. Bioinformatics. 2010;26(17):2176–2182. doi: 10.1093/bioinformatics/btq401. - DOI - PMC - PubMed
    1. Nam D. De-correlating expression in gene-set analysis. Bioinformatics. 2011;27(13):511–516. doi: 10.1093/bioinformatics/btq380. - DOI - PMC - PubMed
    1. Larson JL, Owen AB. Moment based gene set tests. BMC Bioinf. 2015;16(1):1–17. doi: 10.1186/s12859-015-0571-7. - DOI - PMC - PubMed
    1. Barry WT, Nobel AB, Wright FA. Significance analysis of functional categories in gene expression studies: A structured permutation approach. Bioinformatics. 2005;21(9):1943–1949. doi: 10.1093/bioinformatics/bti260. - DOI - PubMed

LinkOut - more resources