A comparative study of different machine learning methods on microarray gene expression data

doi:10.1186/1471-2164-9-S1-S13

Comparative Study

. 2008;9 Suppl 1(Suppl 1):S13.

doi: 10.1186/1471-2164-9-S1-S13.

A comparative study of different machine learning methods on microarray gene expression data

Mehdi Pirooznia¹, Jack Y Yang, Mary Qu Yang, Youping Deng

Affiliations

PMID: 18366602
PMCID: PMC2386055
DOI: 10.1186/1471-2164-9-S1-S13

Comparative Study

A comparative study of different machine learning methods on microarray gene expression data

Mehdi Pirooznia et al. BMC Genomics. 2008.

. 2008;9 Suppl 1(Suppl 1):S13.

doi: 10.1186/1471-2164-9-S1-S13.

Authors

Mehdi Pirooznia¹, Jack Y Yang, Mary Qu Yang, Youping Deng

Affiliation

¹ Department of Biological Sciences, University of Southern Mississippi, Hattiesburg 39406, USA. mehdi.pirooznia@usm.edu

PMID: 18366602
PMCID: PMC2386055
DOI: 10.1186/1471-2164-9-S1-S13

Abstract

Background: Several classification and feature selection methods have been studied for the identification of differentially expressed genes in microarray data. Classification methods such as SVM, RBF Neural Nets, MLP Neural Nets, Bayesian, Decision Tree and Random Forrest methods have been used in recent studies. The accuracy of these methods has been calculated with validation methods such as v-fold validation. However there is lack of comparison between these methods to find a better framework for classification, clustering and analysis of microarray gene expression results.

Results: In this study, we compared the efficiency of the classification methods including; SVM, RBF Neural Nets, MLP Neural Nets, Bayesian, Decision Tree and Random Forrest methods. The v-fold cross validation was used to calculate the accuracy of the classifiers. Some of the common clustering methods including K-means, DBC, and EM clustering were applied to the datasets and the efficiency of these methods have been analysed. Further the efficiency of the feature selection methods including support vector machine recursive feature elimination (SVM-RFE), Chi Squared, and CSF were compared. In each case these methods were applied to eight different binary (two class) microarray datasets. We evaluated the class prediction efficiency of each gene list in training and test cross-validation using supervised classifiers.

Conclusions: We presented a study in which we compared some of the common used classification, clustering, and feature selection methods. We applied these methods to eight publicly available datasets, and compared how these methods performed in class prediction of test datasets. We reported that the choice of feature selection methods, the number of genes in the gene list, the number of cases (samples) substantially influence classification success. Based on features chosen by these methods, error rates and accuracy of several classification algorithms were obtained. Results revealed the importance of feature selection in accurately classifying new samples and how an integrated feature selection and classification algorithm is performing and is capable of identifying significant genes.

PubMed Disclaimer

Figures

**Figure 1**
**Percentage accuracy of 10-fold cross validation of classification methods for all genes.** Results of 10-fold cross validation of the classification methods applied to all datasets without performing any feature selection.

**Figure 2**
**Percentage accuracy of 10-fold cross validation of clustering methods for all genes**. Results of 10-fold cross validation of the two class clustering methods applied to all datasets,

**Figure 3**
**Accuracy of 10-fold cross validation of feature selection and classification methods.** Accuracy of 10-fold cross validation of the pairwise combinations of the feature selection and classification methods

**Figure 4**
**Overview of the analysis pipeline**. The pipeline illustrates the procedure of the pairwise combinations of the feature selection and classification methods

See this image and copyright information in PMC

Cited by

Personal Health Information Inference Using Machine Learning on RNA Expression Data from Patients With Cancer: Algorithm Validation Study.
Kweon S, Lee JH, Lee Y, Park YR. Kweon S, et al. J Med Internet Res. 2020 Aug 10;22(8):e18387. doi: 10.2196/18387. J Med Internet Res. 2020. PMID: 32773372 Free PMC article.
Machine Learning Classifiers for Endometriosis Using Transcriptomics and Methylomics Data.
Akter S, Xu D, Nagel SC, Bromfield JJ, Pelch K, Wilshire GB, Joshi T. Akter S, et al. Front Genet. 2019 Sep 4;10:766. doi: 10.3389/fgene.2019.00766. eCollection 2019. Front Genet. 2019. PMID: 31552087 Free PMC article.
MSC-CSMC: A multi-objective semi-supervised clustering algorithm based on constraints selection and multi-source constraints for gene expression data.
Wang Z, Gu H, Zhao M, Li D, Wang J. Wang Z, et al. Front Genet. 2023 Feb 27;14:1135260. doi: 10.3389/fgene.2023.1135260. eCollection 2023. Front Genet. 2023. PMID: 36923794 Free PMC article.
A comparison of machine learning methods for classification using simulation with multiple real data examples from mental health studies.
Khondoker M, Dobson R, Skirrow C, Simmons A, Stahl D. Khondoker M, et al. Stat Methods Med Res. 2016 Oct;25(5):1804-1823. doi: 10.1177/0962280213502437. Epub 2013 Sep 18. Stat Methods Med Res. 2016. PMID: 24047600 Free PMC article.
Automated machine learning for endemic active tuberculosis prediction from multiplex serological data.
Rashidi HH, Dang LT, Albahra S, Ravindran R, Khan IH. Rashidi HH, et al. Sci Rep. 2021 Sep 9;11(1):17900. doi: 10.1038/s41598-021-97453-7. Sci Rep. 2021. PMID: 34504228 Free PMC article.

See all "Cited by" articles

References

1. Guyon I, Weston J, Barnhill S, Vapnik V. Gene selection for cancer classification using support vector machines. Machine Learning. 2001;46:389–422.
1. Duan KB, Rajapakse JC, Wang H, Azuaje F. Multiple SVM-RFE for gene selection in cancer classification with expression data. IEEE Trans Nanobioscience. 2005;4:228–234. doi: 10.1109/TNB.2005.853657. - DOI - PubMed
1. Liu H., Setiono R. Chi2: Feature Selection and Discretization of Numeric Attributes. Proceedings of the IEEE 7th International Conference on Tools with Artificial Intelligence. 1995. pp. 338–391.
1. Hall M. A. Correlation-based feature selection for machine learning. PhD thesis, University of Waikato, Department of Computer Science, Hamilton, New Zealand. 1998.
1. Wang Y, Tetko IV, Hall MA, Frank E, Facius A, Mayer KF, Mewes HW. Gene selection from microarray data for cancer classification--a machine learning approach. Comput Biol Chem. 2005;29:37–46. doi: 10.1016/j.compbiolchem.2004.11.001. - DOI - PubMed

Publication types

Actions
Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database

[1] Guyon I, Weston J, Barnhill S, Vapnik V. Gene selection for cancer classification using support vector machines. Machine Learning. 2001;46:389–422.

[2] Guyon I, Weston J, Barnhill S, Vapnik V. Gene selection for cancer classification using support vector machines. Machine Learning. 2001;46:389–422.

[3] Duan KB, Rajapakse JC, Wang H, Azuaje F. Multiple SVM-RFE for gene selection in cancer classification with expression data. IEEE Trans Nanobioscience. 2005;4:228–234. doi: 10.1109/TNB.2005.853657. - DOI - PubMed

[4] Duan KB, Rajapakse JC, Wang H, Azuaje F. Multiple SVM-RFE for gene selection in cancer classification with expression data. IEEE Trans Nanobioscience. 2005;4:228–234. doi: 10.1109/TNB.2005.853657. - DOI - PubMed

[5] Liu H., Setiono R. Chi2: Feature Selection and Discretization of Numeric Attributes. Proceedings of the IEEE 7th International Conference on Tools with Artificial Intelligence. 1995. pp. 338–391.

[6] Liu H., Setiono R. Chi2: Feature Selection and Discretization of Numeric Attributes. Proceedings of the IEEE 7th International Conference on Tools with Artificial Intelligence. 1995. pp. 338–391.

[7] Hall M. A. Correlation-based feature selection for machine learning. PhD thesis, University of Waikato, Department of Computer Science, Hamilton, New Zealand. 1998.

[8] Hall M. A. Correlation-based feature selection for machine learning. PhD thesis, University of Waikato, Department of Computer Science, Hamilton, New Zealand. 1998.

[9] Wang Y, Tetko IV, Hall MA, Frank E, Facius A, Mayer KF, Mewes HW. Gene selection from microarray data for cancer classification--a machine learning approach. Comput Biol Chem. 2005;29:37–46. doi: 10.1016/j.compbiolchem.2004.11.001. - DOI - PubMed

[10] Wang Y, Tetko IV, Hall MA, Frank E, Facius A, Mayer KF, Mewes HW. Gene selection from microarray data for cancer classification--a machine learning approach. Comput Biol Chem. 2005;29:37–46. doi: 10.1016/j.compbiolchem.2004.11.001. - DOI - PubMed

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

A comparative study of different machine learning methods on microarray gene expression data

Affiliation

A comparative study of different machine learning methods on microarray gene expression data

Authors

Affiliation

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Related information

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources