Meta-analysis approach as a gene selection method in class prediction: does it improve model performance? A case study in acute myeloid leukemia

doi:10.1186/s12859-017-1619-7

. 2017 Apr 11;18(1):210.

doi: 10.1186/s12859-017-1619-7.

Meta-analysis approach as a gene selection method in class prediction: does it improve model performance? A case study in acute myeloid leukemia

Putri W Novianti^{1

2

3}, Victor L Jong^{4

5}, Kit C B Roes⁴, Marinus J C Eijkemans⁴

Affiliations

¹ Biostatistics & Research Support, Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, 3508, GA, Utrecht, The Netherlands. p.novianti@vumc.nl.
² Department of Epidemiology and Biostatistics, VU University medical center, Amsterdam, The Netherlands. p.novianti@vumc.nl.
³ Department of Pathology, VU University medical center, Amsterdam, The Netherlands. p.novianti@vumc.nl.
⁴ Biostatistics & Research Support, Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, 3508, GA, Utrecht, The Netherlands.
⁵ Viroscience Laboratory, Erasmus Medical Center Rotterdam, 3015, CE, Rotterdam, The Netherlands.

PMID: 28399794
PMCID: PMC5387259
DOI: 10.1186/s12859-017-1619-7

Meta-analysis approach as a gene selection method in class prediction: does it improve model performance? A case study in acute myeloid leukemia

Putri W Novianti et al. BMC Bioinformatics. 2017.

. 2017 Apr 11;18(1):210.

doi: 10.1186/s12859-017-1619-7.

Authors

Putri W Novianti^{1

2

3}, Victor L Jong^{4

5}, Kit C B Roes⁴, Marinus J C Eijkemans⁴

Affiliations

¹ Biostatistics & Research Support, Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, 3508, GA, Utrecht, The Netherlands. p.novianti@vumc.nl.
² Department of Epidemiology and Biostatistics, VU University medical center, Amsterdam, The Netherlands. p.novianti@vumc.nl.
³ Department of Pathology, VU University medical center, Amsterdam, The Netherlands. p.novianti@vumc.nl.
⁴ Biostatistics & Research Support, Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, 3508, GA, Utrecht, The Netherlands.
⁵ Viroscience Laboratory, Erasmus Medical Center Rotterdam, 3015, CE, Rotterdam, The Netherlands.

PMID: 28399794
PMCID: PMC5387259
DOI: 10.1186/s12859-017-1619-7

Abstract

Background: Aggregating gene expression data across experiments via meta-analysis is expected to increase the precision of the effect estimates and to increase the statistical power to detect a certain fold change. This study evaluates the potential benefit of using a meta-analysis approach as a gene selection method prior to predictive modeling in gene expression data.

Results: Six raw datasets from different gene expression experiments in acute myeloid leukemia (AML) and 11 different classification methods were used to build classification models to classify samples as either AML or healthy control. First, the classification models were trained on gene expression data from single experiments using conventional supervised variable selection and externally validated with the other five gene expression datasets (referred to as the individual-classification approach). Next, gene selection was performed through meta-analysis on four datasets, and predictive models were trained with the selected genes on the fifth dataset and validated on the sixth dataset. For some datasets, gene selection through meta-analysis helped classification models to achieve higher performance as compared to predictive modeling based on a single dataset; but for others, there was no major improvement. Synthetic datasets were generated from nine simulation scenarios. The effect of sample size, fold change and pairwise correlation between differentially expressed (DE) genes on the difference between MA- and individual-classification model was evaluated. The fold change and pairwise correlation significantly contributed to the difference in performance between the two methods. The gene selection via meta-analysis approach was more effective when it was conducted using a set of data with low fold change and high pairwise correlation on the DE genes.

Conclusion: Gene selection through meta-analysis on previously published studies potentially improves the performance of a predictive model on a given gene expression data.

Keywords: Acute myeloid leukemia; Gene expression; Meta-analysis; Predictive modeling.

PubMed Disclaimer

Figures

**Fig. 1**
Data division to perform cross-platform classification models building and their characteristics. (#: the number)

**Fig. 2**
The distribution of expression values after pre-processing step from the first three samples in six experiments. The expression values are in log₂ scale

**Fig. 3**
Plot of the difference of classification model accuracies between MA- and individual-classification approach, when Data1 was used as a training data

**Fig. 4**
Plot of the difference of classification model accuracies between MA- and individual-classification approach in the simulated datasets, when Δ = 0.1, γ = 0.75 and (a) n = 50 (Simulation 1) (b) n = 100 (Simulation 4) (c) n = 150 (Simulation 7). The aforementioned simulation parameters resulted in the less informative datasets

See this image and copyright information in PMC

Cited by

Robustification of Naïve Bayes Classifier and Its Application for Microarray Gene Expression Data Analysis.
Ahmed MS, Shahjaman M, Rana MM, Mollah MNH. Ahmed MS, et al. Biomed Res Int. 2017;2017:3020627. doi: 10.1155/2017/3020627. Epub 2017 Aug 7. Biomed Res Int. 2017. PMID: 28848763 Free PMC article.
The importance of genomic predictors for clinical outcome of hematological malignancies.
Chen C, Zeng C, Li Y. Chen C, et al. Blood Sci. 2021 Jul 7;3(3):93-95. doi: 10.1097/BS9.0000000000000075. eCollection 2021 Jul. Blood Sci. 2021. PMID: 35402837 Free PMC article. No abstract available.
High-Throughput Omics and Statistical Learning Integration for the Discovery and Validation of Novel Diagnostic Signatures in Colorectal Cancer.
Long NP, Park S, Anh NH, Nghi TD, Yoon SJ, Park JH, Lim J, Kwon SW. Long NP, et al. Int J Mol Sci. 2019 Jan 12;20(2):296. doi: 10.3390/ijms20020296. Int J Mol Sci. 2019. PMID: 30642095 Free PMC article.

References

1. Dupuy A, Simon RM. Critical review of published microarray studies for cancer outcome and guidelines on statistical analysis and reporting. J Natl Cancer Inst. 2007;99(2):147–157. doi: 10.1093/jnci/djk018. - DOI - PubMed
1. Ein-Dor L, Kela I, Getz G, Givol D, Domany E. Outcome signature genes in breast cancer: is there a unique set? Bioinformatics (Oxford, England) 2005;21(2):171–178. doi: 10.1093/bioinformatics/bth469. - DOI - PubMed
1. Ein-Dor L, Zuk O, Domany E. Thousands of samples are needed to generate a robust gene list for predicting outcome in cancer. Proc Natl Acad Sci USA. 2006;103(15):5923–5928. doi: 10.1073/pnas.0601231103. - DOI - PMC - PubMed
1. Gormley M, Dampier W, Ertel A, Karacali B, Tozeren A. Prediction potential of candidate biomarker sets identified and validated on gene expression data from multiple datasets. BMC bioinformatics. 2007;8:415. doi: 10.1186/1471-2105-8-415. - DOI - PMC - PubMed
1. Miller JA, Cai C, Langfelder P, Geschwind DH, Kurian SM, Salomon DR, Horvath S. Strategies for aggregating gene expression data: the collapseRows R function. BMC bioinformatics. 2011;12:322. doi: 10.1186/1471-2105-12-322. - DOI - PMC - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations
Medical
- MedlinePlus Health Information

[1] Dupuy A, Simon RM. Critical review of published microarray studies for cancer outcome and guidelines on statistical analysis and reporting. J Natl Cancer Inst. 2007;99(2):147–157. doi: 10.1093/jnci/djk018. - DOI - PubMed

[2] Dupuy A, Simon RM. Critical review of published microarray studies for cancer outcome and guidelines on statistical analysis and reporting. J Natl Cancer Inst. 2007;99(2):147–157. doi: 10.1093/jnci/djk018. - DOI - PubMed

[3] Ein-Dor L, Kela I, Getz G, Givol D, Domany E. Outcome signature genes in breast cancer: is there a unique set? Bioinformatics (Oxford, England) 2005;21(2):171–178. doi: 10.1093/bioinformatics/bth469. - DOI - PubMed

[4] Ein-Dor L, Kela I, Getz G, Givol D, Domany E. Outcome signature genes in breast cancer: is there a unique set? Bioinformatics (Oxford, England) 2005;21(2):171–178. doi: 10.1093/bioinformatics/bth469. - DOI - PubMed

[5] Ein-Dor L, Zuk O, Domany E. Thousands of samples are needed to generate a robust gene list for predicting outcome in cancer. Proc Natl Acad Sci USA. 2006;103(15):5923–5928. doi: 10.1073/pnas.0601231103. - DOI - PMC - PubMed

[6] Ein-Dor L, Zuk O, Domany E. Thousands of samples are needed to generate a robust gene list for predicting outcome in cancer. Proc Natl Acad Sci USA. 2006;103(15):5923–5928. doi: 10.1073/pnas.0601231103. - DOI - PMC - PubMed

[7] Gormley M, Dampier W, Ertel A, Karacali B, Tozeren A. Prediction potential of candidate biomarker sets identified and validated on gene expression data from multiple datasets. BMC bioinformatics. 2007;8:415. doi: 10.1186/1471-2105-8-415. - DOI - PMC - PubMed

[8] Gormley M, Dampier W, Ertel A, Karacali B, Tozeren A. Prediction potential of candidate biomarker sets identified and validated on gene expression data from multiple datasets. BMC bioinformatics. 2007;8:415. doi: 10.1186/1471-2105-8-415. - DOI - PMC - PubMed

[9] Miller JA, Cai C, Langfelder P, Geschwind DH, Kurian SM, Salomon DR, Horvath S. Strategies for aggregating gene expression data: the collapseRows R function. BMC bioinformatics. 2011;12:322. doi: 10.1186/1471-2105-12-322. - DOI - PMC - PubMed

[10] Miller JA, Cai C, Langfelder P, Geschwind DH, Kurian SM, Salomon DR, Horvath S. Strategies for aggregating gene expression data: the collapseRows R function. BMC bioinformatics. 2011;12:322. doi: 10.1186/1471-2105-12-322. - DOI - PMC - PubMed

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Meta-analysis approach as a gene selection method in class prediction: does it improve model performance? A case study in acute myeloid leukemia

Affiliations

Meta-analysis approach as a gene selection method in class prediction: does it improve model performance? A case study in acute myeloid leukemia

Authors

Affiliations

Abstract

Figures

Similar articles

Cited by

References

MeSH terms

LinkOut - more resources

Full Text Sources

Other Literature Sources

Medical

Abstract

Figures

Similar articles

Cited by

References

MeSH terms

Related information

LinkOut - more resources

Full Text Sources

Other Literature Sources

Medical