Analyzing gene expression data in terms of gene sets: methodological issues
- PMID: 17303618
- DOI: 10.1093/bioinformatics/btm051
Analyzing gene expression data in terms of gene sets: methodological issues
Abstract
Motivation: Many statistical tests have been proposed in recent years for analyzing gene expression data in terms of gene sets, usually from Gene Ontology. These methods are based on widely different methodological assumptions. Some approaches test differential expression of each gene set against differential expression of the rest of the genes, whereas others test each gene set on its own. Also, some methods are based on a model in which the genes are the sampling units, whereas others treat the subjects as the sampling units. This article aims to clarify the assumptions behind different approaches and to indicate a preferential methodology of gene set testing.
Results: We identify some crucial assumptions which are needed by the majority of methods. P-values derived from methods that use a model which takes the genes as the sampling unit are easily misinterpreted, as they are based on a statistical model that does not resemble the biological experiment actually performed. Furthermore, because these models are based on a crucial and unrealistic independence assumption between genes, the P-values derived from such methods can be wildly anti-conservative, as a simulation experiment shows. We also argue that methods that competitively test each gene set against the rest of the genes create an unnecessary rift between single gene testing and gene set testing.
Similar articles
-
Orthogonal projections to latent structures as a strategy for microarray data normalization.BMC Bioinformatics. 2007 Jun 18;8:207. doi: 10.1186/1471-2105-8-207. BMC Bioinformatics. 2007. PMID: 17577396 Free PMC article.
-
Classification based upon gene expression data: bias and precision of error rates.Bioinformatics. 2007 Jun 1;23(11):1363-70. doi: 10.1093/bioinformatics/btm117. Epub 2007 Mar 28. Bioinformatics. 2007. PMID: 17392326 Review.
-
Data-adaptive test statistics for microarray data.Bioinformatics. 2005 Sep 1;21 Suppl 2:ii108-14. doi: 10.1093/bioinformatics/bti1119. Bioinformatics. 2005. PMID: 16204088
-
A graph-theoretic modeling on GO space for biological interpretation of gene clusters.Bioinformatics. 2004 Feb 12;20(3):381-8. doi: 10.1093/bioinformatics/btg420. Epub 2004 Jan 22. Bioinformatics. 2004. PMID: 14960465
-
Gene Set Analysis: Challenges, Opportunities, and Future Research.Front Genet. 2020 Jun 30;11:654. doi: 10.3389/fgene.2020.00654. eCollection 2020. Front Genet. 2020. PMID: 32695141 Free PMC article. Review.
Cited by
-
Differential expression analysis for pathways.PLoS Comput Biol. 2013;9(3):e1002967. doi: 10.1371/journal.pcbi.1002967. Epub 2013 Mar 14. PLoS Comput Biol. 2013. PMID: 23516350 Free PMC article.
-
Inflammation and immunological disarrays are associated with acute exercise in type 2 diabetes.J Diabetes Metab Disord. 2024 Apr 18;23(1):1243-1250. doi: 10.1007/s40200-024-01417-3. eCollection 2024 Jun. J Diabetes Metab Disord. 2024. PMID: 38932912
-
Monte Carlo simulation of OLS and linear mixed model inference of phenotypic effects on gene expression.PeerJ. 2016 Oct 11;4:e2575. doi: 10.7717/peerj.2575. eCollection 2016. PeerJ. 2016. PMID: 27761350 Free PMC article.
-
Transcriptomic Analysis of Skin Color in Anole Lizards.Genome Biol Evol. 2021 Jul 6;13(7):evab110. doi: 10.1093/gbe/evab110. Genome Biol Evol. 2021. PMID: 33988681 Free PMC article.
-
Assessment method for a power analysis to identify differentially expressed pathways.PLoS One. 2012;7(5):e37510. doi: 10.1371/journal.pone.0037510. Epub 2012 May 18. PLoS One. 2012. PMID: 22629411 Free PMC article.
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Other Literature Sources