Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Oct 5:6:34759.
doi: 10.1038/srep34759.

Feature Subset Selection for Cancer Classification Using Weight Local Modularity

Affiliations

Feature Subset Selection for Cancer Classification Using Weight Local Modularity

Guodong Zhao et al. Sci Rep. .

Abstract

Microarray is recently becoming an important tool for profiling the global gene expression patterns of tissues. Gene selection is a popular technology for cancer classification that aims to identify a small number of informative genes from thousands of genes that may contribute to the occurrence of cancers to obtain a high predictive accuracy. This technique has been extensively studied in recent years. This study develops a novel feature selection (FS) method for gene subset selection by utilizing the Weight Local Modularity (WLM) in a complex network, called the WLMGS. In the proposed method, the discriminative power of gene subset is evaluated by using the weight local modularity of a weighted sample graph in the gene subset where the intra-class distance is small and the inter-class distance is large. A higher local modularity of the gene subset corresponds to a greater discriminative of the gene subset. With the use of forward search strategy, a more informative gene subset as a group can be selected for the classification process. Computational experiments show that the proposed algorithm can select a small subset of the predictive gene as a group while preserving classification accuracy.

PubMed Disclaimer

Figures

Figure 1
Figure 1. The average classification accuracy using 1NN classifier with respect to the subset of s features selected by different filter methods.
For different methods, (a) is the classification accuracy in data MLL, (b) is the classification accuracy in data Lymphoma, (c) is the classification accuracy in data ALL-AML-3c, (d) is the classification accuracy in data DLBCL-A, (e) is the classification accuracy in data SRBCT, (f) is the classification accuracy in data CNS, (g) is the classification accuracy in data Lung, (h) is the classification accuracy in data Colon.
Figure 2
Figure 2. The average classification accuracy using SVM classifier with respect to the subset of s features selected by different filter methods.
For different methods, (a) is the classification accuracy in data ALL-AML-3c, (b) is the classification accuracy in data MLL, (c) is the classification accuracy in data Lymphoma, (d) is the classification accuracy in data Lung, (e) is the classification accuracy in data DLBCL-A, (f) is the classification accuracy in data Colon, (g) is the classification accuracy in data CNS, (f) is the classification accuracy in data SRBCT.
Figure 3
Figure 3. The average classification accuracy using 1NN classifier with respect to the subset of s features selected by different wrapped methods.
For different methods, (a) is the classification accuracy in data ALL-AML-3c, (b) is the classification accuracy in data CNS, (c) is the classification accuracy in data Colon, (d) is the classification accuracy in data DLBCL-A, (e) is the classification accuracy in data Lung, (f) is the classification accuracy in data Lymphoma, (g) is the classification accuracy in data MLL, (h) is the classification accuracy in data SRBCT.
Figure 4
Figure 4. The average classification accuracy using SVM classifier with respect to the subset of s features selected by different wrapped methods.
For different methods, (a) is the classification accuracy in data CNS, (b) is the classification accuracy in data Colon, (c) is the classification accuracy in data DLBCL-A, (d) is the classification accuracy in data Lung, (e) is the classification accuracy in data Lymphoma, (f) is the classification accuracy in data MLL, (g) is the classification accuracy in data ALL-AML-3c (h) is the classification accuracy in data SRBCT.
Figure 5
Figure 5. The average time cost in terms of Top 20 genes selected by our method and wrapped methods.
Figure 6
Figure 6. The average 1NN accuracy results on the different k for all datasets in our method.
(a) is the classification accuracy on the different k for the data ALL-AML-3c, (b) is the classification accuracy on the different k for the data SRBCT, (c) is the classification accuracy on the different k for the data Lymphoma, (d) is the classification accuracy on the different k for the data DLBCL-A, (e) is the classification accuracy on the different k for the data CNS (f) is the classification accuracy on the different k for the data Colon, (g) is the classification accuracy on the different k for the data MLL, (h) is the classification accuracy on the different k for the data Lung.
Figure 7
Figure 7. A simple graph with three local communities, enclosed by the dashed circles.
Reprinted figure with permission from ref. .

Similar articles

Cited by

References

    1. José E. A., Garć ıa. N., Jourdan L. & Talbi E. G. Gene selection in cancer classification using PSO/SVM and GA/SVM hybrid algorithms. IEEE C. Evol. Computat. 9, 284–290 (2007).
    1. Derrac J., Cornelis C., García S. & Herrera F. Enhancing evolutionary instance selection algorithms by means of fuzzy rough set based feature selection. Information Sciences 186, 73–92 (2012).
    1. Sun X., Liu Y. H., Wei D. & Xu M. T. Selection of interdependent genes via dynamic relevance analysis for cancer diagnosis. J. Biomed. Inform. 46, 252–258 (2013). - PubMed
    1. Guyon I., Weston J., Barnhill S. & Vapnik V. Gene selection for cancer classification using suppor tvector machines. Mach. Learn. 46, 389–422 (2002).
    1. Saeys1 Y., Inza Iñ & Larrañaga P. A review of feature selection techniques in bioinformatics. Bioinformatics 23, 2507–2517 (2007). - PubMed

MeSH terms

LinkOut - more resources