Automated Discovery of Functional Generality of Human Gene Expression Programs
Figure 5
GeneProgram Outperformed Two Popular Biclustering Methods, an NMF Implementation and Samba, in Terms of Gene Set Consistency between Two Large Compendia of Mammalian Tissue Gene Expression Data
Because the two data compendia used different microarray platforms and sources for tissues, similarities in discovered gene sets between compendia were likely to be biologically relevant. For each algorithm, we used gene sets discovered from one data compendium to compute the significance of the overlap (p-values) with sets produced using the second compendium. We then inverted the analysis and averaged the results to produce the correspondence plots shown. The plots depict log p-values on the horizontal axis and the fraction of gene sets with p-values below a given value on the vertical axis (see the Methods section for details). The larger fraction of gene sets at most p-values suggests that GeneProgram generally produces the most consistent results between the data compendia.