Statistical development and evaluation of microarray gene expression data filters
- PMID: 15882143
- DOI: 10.1089/cmb.2005.12.482
Statistical development and evaluation of microarray gene expression data filters
Abstract
Filtering is a common practice used to simplify the analysis of microarray data by removing from subsequent consideration probe sets believed to be unexpressed. The m/n filter, which is widely used in the analysis of Affymetrix data, removes all probe sets having fewer than m present calls among a set of n chips. The m/n filter has been widely used without considering its statistical properties. The level and power of the m/n filter are derived. Two alternative filters, the pooled p-value filter and the error-minimizing pooled p-value filter are proposed. The pooled p-value filter combines information from the present-absent p-values into a single summary p-value which is subsequently compared to a selected significance threshold. We show that pooled p-value filter is the uniformly most powerful statistical test under a reasonable beta model and that it exhibits greater power than the m/n filter in all scenarios considered in a simulation study. The error-minimizing pooled p-value filter compares the summary p-value with a threshold determined to minimize a total-error criterion based on a partition of the distribution of all probes' summary p-values. The pooled p-value and error-minimizing pooled p-value filters clearly perform better than the m/n filter in a case-study analysis. The case-study analysis also demonstrates a proposed method for estimating the number of differentially expressed probe sets excluded by filtering and subsequent impact on the final analysis. The filter impact analysis shows that the use of even the best filter may hinder, rather than enhance, the ability to discover interesting probe sets or genes. S-plus and R routines to implement the pooled p-value and error-minimizing pooled p-value filters have been developed and are available from www.stjuderesearch.org/depts/biostats/index.html.
Similar articles
-
Robust estimation of the false discovery rate.Bioinformatics. 2006 Aug 15;22(16):1979-87. doi: 10.1093/bioinformatics/btl328. Epub 2006 Jun 15. Bioinformatics. 2006. PMID: 16777905
-
Gene expression analysis in clear cell renal cell carcinoma using gene set enrichment analysis for biostatistical management.BJU Int. 2011 Jul;108(2 Pt 2):E29-35. doi: 10.1111/j.1464-410X.2010.09794.x. Epub 2011 Mar 16. BJU Int. 2011. PMID: 21435154
-
Informative or noninformative calls for gene expression: a latent variable approach.Stat Appl Genet Mol Biol. 2010;9:Article 4. doi: 10.2202/1544-6115.1460. Epub 2010 Jan 6. Stat Appl Genet Mol Biol. 2010. PMID: 20196754
-
Dealing with missing values in large-scale studies: microarray data imputation and beyond.Brief Bioinform. 2010 Mar;11(2):253-64. doi: 10.1093/bib/bbp059. Epub 2009 Dec 4. Brief Bioinform. 2010. PMID: 19965979 Review.
-
Bayesian normalization and identification for differential gene expression data.J Comput Biol. 2005 May;12(4):391-406. doi: 10.1089/cmb.2005.12.391. J Comput Biol. 2005. PMID: 15882138 Review.
Cited by
-
Reference alignment of SNP microarray signals for copy number analysis of tumors.Bioinformatics. 2009 Feb 1;25(3):315-21. doi: 10.1093/bioinformatics/btn624. Epub 2008 Dec 3. Bioinformatics. 2009. PMID: 19052058 Free PMC article.
-
The effect of insulin on expression of genes and biochemical pathways in human skeletal muscle.Endocrine. 2007 Feb;31(1):5-17. doi: 10.1007/s12020-007-0007-x. Endocrine. 2007. PMID: 17709892
-
Sources of variation in Affymetrix microarray experiments.BMC Bioinformatics. 2005 Aug 29;6:214. doi: 10.1186/1471-2105-6-214. BMC Bioinformatics. 2005. PMID: 16124883 Free PMC article.
-
False discovery rate paradigms for statistical analyses of microarray gene expression data.Bioinformation. 2007 Apr 10;1(10):436-46. doi: 10.6026/97320630001436. Bioinformation. 2007. PMID: 17597936 Free PMC article.
-
PROMISE: a tool to identify genomic features with a specific biologically interesting pattern of associations with multiple endpoint variables.Bioinformatics. 2009 Aug 15;25(16):2013-9. doi: 10.1093/bioinformatics/btp357. Epub 2009 Jun 15. Bioinformatics. 2009. PMID: 19528086 Free PMC article.
Publication types
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources