Input feature selection for classification problems
- PMID: 18244416
- DOI: 10.1109/72.977291
Input feature selection for classification problems
Abstract
Feature selection plays an important role in classifying systems such as neural networks (NNs). We use a set of attributes which are relevant, irrelevant or redundant and from the viewpoint of managing a dataset which can be huge, reducing the number of attributes by selecting only the relevant ones is desirable. In doing so, higher performances with lower computational effort is expected. In this paper, we propose two feature selection algorithms. The limitation of mutual information feature selector (MIFS) is analyzed and a method to overcome this limitation is studied. One of the proposed algorithms makes more considered use of mutual information between input attributes and output classes than the MIFS. What is demonstrated is that the proposed method can provide the performance of the ideal greedy selection algorithm when information is distributed uniformly. The computational load for this algorithm is nearly the same as that of MIFS. In addition, another feature selection algorithm using the Taguchi method is proposed. This is advanced as a solution to the question as to how to identify good features with as few experiments as possible. The proposed algorithms are applied to several classification problems and compared with MIFS. These two algorithms can be combined to complement each other's limitations. The combined algorithm performed well in several experiments and should prove to be a useful method in selecting features for classification problems.
Similar articles
-
Neural-network feature selector.IEEE Trans Neural Netw. 1997;8(3):654-62. doi: 10.1109/72.572104. IEEE Trans Neural Netw. 1997. PMID: 18255668
-
Conditional mutual information-based feature selection for congestive heart failure recognition using heart rate variability.Comput Methods Programs Biomed. 2012 Oct;108(1):299-309. doi: 10.1016/j.cmpb.2011.12.015. Epub 2012 Jan 18. Comput Methods Programs Biomed. 2012. PMID: 22261219
-
Particle swarm optimization for feature selection in classification: a multi-objective approach.IEEE Trans Cybern. 2013 Dec;43(6):1656-71. doi: 10.1109/TSMCB.2012.2227469. IEEE Trans Cybern. 2013. PMID: 24273143
-
A scalable memetic algorithm for simultaneous instance and feature selection.Evol Comput. 2014 Spring;22(1):1-45. doi: 10.1162/EVCO_a_00102. Epub 2013 Aug 8. Evol Comput. 2014. PMID: 23544367
-
R-Ensembler: A greedy rough set based ensemble attribute selection algorithm with kNN imputation for classification of medical data.Comput Methods Programs Biomed. 2020 Feb;184:105122. doi: 10.1016/j.cmpb.2019.105122. Epub 2019 Oct 8. Comput Methods Programs Biomed. 2020. PMID: 31622857
Cited by
-
D2BOF-COVIDNet: A Framework of Deep Bayesian Optimization and Fusion-Assisted Optimal Deep Features for COVID-19 Classification Using Chest X-ray and MRI Scans.Diagnostics (Basel). 2022 Dec 29;13(1):101. doi: 10.3390/diagnostics13010101. Diagnostics (Basel). 2022. PMID: 36611393 Free PMC article.
-
Classifying Drosophila Olfactory Projection Neuron Subtypes by Single-Cell RNA Sequencing.Cell. 2017 Nov 16;171(5):1206-1220.e22. doi: 10.1016/j.cell.2017.10.019. Cell. 2017. PMID: 29149607 Free PMC article.
-
A novel feature selection method and its application.J Intell Inf Syst. 2013 Oct 1;41(2):235-268. doi: 10.1007/s10844-013-0243-x. J Intell Inf Syst. 2013. PMID: 25530672 Free PMC article.
-
Discovery of significant porcine SNPs for swine breed identification by a hybrid of information gain, genetic algorithm, and frequency feature selection technique.BMC Bioinformatics. 2020 May 26;21(1):216. doi: 10.1186/s12859-020-3471-4. BMC Bioinformatics. 2020. PMID: 32456608 Free PMC article.
-
Ensemble Fuzzy Feature Selection Based on Relevancy, Redundancy, and Dependency Criteria.Entropy (Basel). 2020 Jul 9;22(7):757. doi: 10.3390/e22070757. Entropy (Basel). 2020. PMID: 33286530 Free PMC article.
LinkOut - more resources
Full Text Sources
Other Literature Sources