Scoring clustering solutions by their biological relevance
- PMID: 14668221
- DOI: 10.1093/bioinformatics/btg330
Scoring clustering solutions by their biological relevance
Abstract
Motivation: A central step in the analysis of gene expression data is the identification of groups of genes that exhibit similar expression patterns. Clustering gene expression data into homogeneous groups was shown to be instrumental in functional annotation, tissue classification, regulatory motif identification, and other applications. Although there is a rich literature on clustering algorithms for gene expression analysis, very few works addressed the systematic comparison and evaluation of clustering results. Typically, different clustering algorithms yield different clustering solutions on the same data, and there is no agreed upon guideline for choosing among them.
Results: We developed a novel statistically based method for assessing a clustering solution according to prior biological knowledge. Our method can be used to compare different clustering solutions or to optimize the parameters of a clustering algorithm. The method is based on projecting vectors of biological attributes of the clustered elements onto the real line, such that the ratio of between-groups and within-group variance estimators is maximized. The projected data are then scored using a non-parametric analysis of variance test, and the score's confidence is evaluated. We validate our approach using simulated data and show that our scoring method outperforms several extant methods, including the separation to homogeneity ratio and the silhouette measure. We apply our method to evaluate results of several clustering methods on yeast cell-cycle gene expression data.
Availability: The software is available from the authors upon request.
Similar articles
-
Comparisons and validation of statistical clustering techniques for microarray gene expression data.Bioinformatics. 2003 Mar 1;19(4):459-66. doi: 10.1093/bioinformatics/btg025. Bioinformatics. 2003. PMID: 12611800
-
CLICK and EXPANDER: a system for clustering and visualizing gene expression data.Bioinformatics. 2003 Sep 22;19(14):1787-99. doi: 10.1093/bioinformatics/btg232. Bioinformatics. 2003. PMID: 14512350
-
Supervised cluster analysis for microarray data based on multivariate Gaussian mixture.Bioinformatics. 2004 Aug 12;20(12):1905-13. doi: 10.1093/bioinformatics/bth177. Epub 2004 Mar 25. Bioinformatics. 2004. PMID: 15044244
-
Clustering of time-course gene expression data using a mixed-effects model with B-splines.Bioinformatics. 2003 Mar 1;19(4):474-82. doi: 10.1093/bioinformatics/btg014. Bioinformatics. 2003. PMID: 12611802
-
Bayesian mixture model based clustering of replicated microarray data.Bioinformatics. 2004 May 22;20(8):1222-32. doi: 10.1093/bioinformatics/bth068. Epub 2004 Feb 10. Bioinformatics. 2004. PMID: 14871871
Cited by
-
A robust measure of correlation between two genes on a microarray.BMC Bioinformatics. 2007 Jun 25;8:220. doi: 10.1186/1471-2105-8-220. BMC Bioinformatics. 2007. PMID: 17592643 Free PMC article.
-
Recursive cluster elimination (RCE) for classification and feature selection from gene expression data.BMC Bioinformatics. 2007 May 2;8:144. doi: 10.1186/1471-2105-8-144. BMC Bioinformatics. 2007. PMID: 17474999 Free PMC article.
-
DISCLOSE : DISsection of CLusters Obtained by SEries of transcriptome data using functional annotations and putative transcription factor binding sites.BMC Bioinformatics. 2008 Dec 16;9:535. doi: 10.1186/1471-2105-9-535. BMC Bioinformatics. 2008. PMID: 19087282 Free PMC article.
-
Evaluation of gene-expression clustering via mutual information distance measure.BMC Bioinformatics. 2007 Mar 30;8:111. doi: 10.1186/1471-2105-8-111. BMC Bioinformatics. 2007. PMID: 17397530 Free PMC article.
-
Silhouette Scores for Arbitrary Defined Groups in Gene Expression Data and Insights into Differential Expression Results.Biol Proced Online. 2018 Mar 1;20:5. doi: 10.1186/s12575-018-0067-8. eCollection 2018. Biol Proced Online. 2018. PMID: 29507534 Free PMC article.
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Molecular Biology Databases