Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013 Apr 17;8(4):e61505.
doi: 10.1371/journal.pone.0061505. Print 2013.

When is hub gene selection better than standard meta-analysis?

Affiliations

When is hub gene selection better than standard meta-analysis?

Peter Langfelder et al. PLoS One. .

Abstract

Since hub nodes have been found to play important roles in many networks, highly connected hub genes are expected to play an important role in biology as well. However, the empirical evidence remains ambiguous. An open question is whether (or when) hub gene selection leads to more meaningful gene lists than a standard statistical analysis based on significance testing when analyzing genomic data sets (e.g., gene expression or DNA methylation data). Here we address this question for the special case when multiple genomic data sets are available. This is of great practical importance since for many research questions multiple data sets are publicly available. In this case, the data analyst can decide between a standard statistical approach (e.g., based on meta-analysis) and a co-expression network analysis approach that selects intramodular hubs in consensus modules. We assess the performance of these two types of approaches according to two criteria. The first criterion evaluates the biological insights gained and is relevant in basic research. The second criterion evaluates the validation success (reproducibility) in independent data sets and often applies in clinical diagnostic or prognostic applications. We compare meta-analysis with consensus network analysis based on weighted correlation network analysis (WGCNA) in three comprehensive and unbiased empirical studies: (1) Finding genes predictive of lung cancer survival, (2) finding methylation markers related to age, and (3) finding mouse genes related to total cholesterol. The results demonstrate that intramodular hub gene status with respect to consensus modules is more useful than a meta-analysis p-value when identifying biologically meaningful gene lists (reflecting criterion 1). However, standard meta-analysis methods perform as good as (if not better than) a consensus network approach in terms of validation success (criterion 2). The article also reports a comparison of meta-analysis techniques applied to gene expression data and presents novel R functions for carrying out consensus network analysis, network based screening, and meta analysis.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The corresponding author (SH) is an Associate Editor of PLOS ONE. This does not alter the authors’ adherence to all the PLOS ONE policies on sharing data and materials.

Figures

Figure 1
Figure 1. Meta-analysis of module membership leads to gene lists with stronger functional enrichment.
The 3 barplots show enrichment values, defined as negative formula image of the enrichment p-value, formula image, in our 3 applications. Each bar summarizes the best enrichment values obtained by the corresponding meta-analysis method. Specifically, for each method we computed the enrichment in the corresponding “gold standard” list of genes. The enrichment was calculated in the top 20, 40, 60, …, 1000 genes in the adenocarcinoma and mouse TC applications; and in 100, 200, …, 5000 genes in the aging application. The best 20% of enrichment values were retained. Each bar represents the mean of these best enrichment values, and error bars give the corresponding standard deviations. The standard deviations are not corrected for auto-correlation of enrichment values. The Kruskal-Wallis test p-value is indicated in the title. The figure shows that meta-analysis of membership in consensus modules leads to gene lists with higher enrichment and hence better biological interpretability.
Figure 2
Figure 2. Marginal meta-analysis tends to lead to gene lists with better validation in independent data.
The 3 barplots show validation success in our 3 applications. Each bar summarizes the gene screening success of the corresponding meta-analysis method. Specifically, we rank the genes using each meta-analysis method and retain the top 100 genes. We define gene screening success as the average correlation of these top 100 genes with the trait of interest in an independent validation data set, averaged over the validation sets in each application. Each bar represents the gene screening success; error bars give the corresponding standard deviation of the observed gene–trait correlations in the top 100 genes. This figure shows that, overall, marginal meta-analysis leads to gene lists with better validation success (i.e., higher correlation with the trait of interest in validation data). Adenocarcinoma expression data (panel A) present an exception in that meta-analysis of module membership results in gene lists with somewhat better validation.
Figure 3
Figure 3. Simulation studies of gene screening success of meta-analysis methods.
The barplots show validation success of the various meta-analysis methods in simulated data with 2 different traits. Continuous clinical trait 1 is weakly related to a module eigengene that may, in real data, represent the state of a pathway. In this case meta-analysis of module membership outperforms marginal meta-analysis in identifying validated genes. In contrast, clinical trait 2 is simulated to be strongly correlated with the eigengene of a small submodule of one of the identified modules. Here marginal meta-analysis outperforms meta-analysis of module membership. Analogously to Figure 2, each bar summarizes the gene screening success of the corresponding meta-analysis methods for each of the simulated traits. For each meta-analysis method we rank the genes based on the method and retain the top 50 genes. We define gene screening success as the average correlation of these top 50 genes with the trait of interest in an independent validation data set, averaged over the validation sets in each application. Each bar represents the gene screening success; error bars give the corresponding standard deviation of the observed gene–trait correlations in the top 50 genes.

Similar articles

Cited by

References

    1. Butte A, Tamayo P, Slonim D, Golub T, Kohane I (2000) Discovering functional relationships between RNA expression and chemotherapeutic susceptibility using relevance networks. PNAS 97: 12182–12186. - PMC - PubMed
    1. Stuart JM, Segal E, Koller D, Kim SK (2003) A Gene-Coexpression Network for Global Discovery of Conserved Genetic Modules. Science 302: 249–255. - PubMed
    1. Zhang B, Horvath S (2005) General framework for weighted gene coexpression analysis. Statistical Applications in Genetics and Molecular Biology 4. - PubMed
    1. Wolfe C, Kohane I, Butte A (2005) Systematic survey reveals general applicability of “guilt-by-association” within gene coexpression networks. BMC Bioinformatics 6: 227. - PMC - PubMed
    1. Huang Y, Li H, Hu H, Yan X, Waterman M, et al. (2007) Systematic discovery of functional modules and context-specific functional annotation of human genome. Bioinformatics 23: i222–229. - PubMed

Publication types