Statistical approach for selection of biologically informative genes
- PMID: 29458166
- DOI: 10.1016/j.gene.2018.02.044
Statistical approach for selection of biologically informative genes
Abstract
Selection of informative genes from high dimensional gene expression data has emerged as an important research area in genomics. Many gene selection techniques have been proposed so far are either based on relevancy or redundancy measure. Further, the performance of these techniques has been adjudged through post selection classification accuracy computed through a classifier using the selected genes. This performance metric may be statistically sound but may not be biologically relevant. A statistical approach, i.e. Boot-MRMR, was proposed based on a composite measure of maximum relevance and minimum redundancy, which is both statistically sound and biologically relevant for informative gene selection. For comparative evaluation of the proposed approach, we developed two biological sufficient criteria, i.e. Gene Set Enrichment with QTL (GSEQ) and biological similarity score based on Gene Ontology (GO). Further, a systematic and rigorous evaluation of the proposed technique with 12 existing gene selection techniques was carried out using five gene expression datasets. This evaluation was based on a broad spectrum of statistically sound (e.g. subject classification) and biological relevant (based on QTL and GO) criteria under a multiple criteria decision-making framework. The performance analysis showed that the proposed technique selects informative genes which are more biologically relevant. The proposed technique is also found to be quite competitive with the existing techniques with respect to subject classification and computational time. Our results also showed that under the multiple criteria decision-making setup, the proposed technique is best for informative gene selection over the available alternatives. Based on the proposed approach, an R Package, i.e. BootMRMR has been developed and available at https://cran.r-project.org/web/packages/BootMRMR. This study will provide a practical guide to select statistical techniques for selecting informative genes from high dimensional expression data for breeding and system biology studies.
Keywords: Boot-MRMR; Bootstrap; Gene Set Enrichment with QTLs; Gene sampling; Informative genes; Subject sampling.
Published by Elsevier B.V.
Similar articles
-
Statistical Approach for Biologically Relevant Gene Selection from High-Throughput Gene Expression Data.Entropy (Basel). 2020 Oct 25;22(11):1205. doi: 10.3390/e22111205. Entropy (Basel). 2020. PMID: 33286973 Free PMC article.
-
In silico markers: an evolutionary and statistical approach to select informative genes of human breast cancer subtypes.Genes Genomics. 2019 Dec;41(12):1371-1382. doi: 10.1007/s13258-019-00816-8. Epub 2019 Apr 19. Genes Genomics. 2019. PMID: 31004329
-
Statistical Approaches for Gene Selection, Hub Gene Identification and Module Interaction in Gene Co-Expression Network Analysis: An Application to Aluminum Stress in Soybean (Glycine max L.).PLoS One. 2017 Jan 5;12(1):e0169605. doi: 10.1371/journal.pone.0169605. eCollection 2017. PLoS One. 2017. PMID: 28056073 Free PMC article.
-
An assessment of recently published gene expression data analyses: reporting experimental design and statistical factors.BMC Med Inform Decis Mak. 2006 Jun 21;6:27. doi: 10.1186/1472-6947-6-27. BMC Med Inform Decis Mak. 2006. PMID: 16790051 Free PMC article. Review.
-
Bioinformatics analysis of microarray data.Methods Mol Biol. 2009;573:259-84. doi: 10.1007/978-1-60761-247-6_15. Methods Mol Biol. 2009. PMID: 19763933 Review.
Cited by
-
Multigroup prediction in lung cancer patients and comparative controls using signature of volatile organic compounds in breath samples.PLoS One. 2022 Nov 30;17(11):e0277431. doi: 10.1371/journal.pone.0277431. eCollection 2022. PLoS One. 2022. PMID: 36449484 Free PMC article.
-
A Framework for Comparison and Assessment of Synthetic RNA-Seq Data.Genes (Basel). 2022 Dec 14;13(12):2362. doi: 10.3390/genes13122362. Genes (Basel). 2022. PMID: 36553629 Free PMC article.
-
Fifteen Years of Gene Set Analysis for High-Throughput Genomic Data: A Review of Statistical Approaches and Future Challenges.Entropy (Basel). 2020 Apr 10;22(4):427. doi: 10.3390/e22040427. Entropy (Basel). 2020. PMID: 33286201 Free PMC article. Review.
-
A Comprehensive Survey of Statistical Approaches for Differential Expression Analysis in Single-Cell RNA Sequencing Studies.Genes (Basel). 2021 Dec 2;12(12):1947. doi: 10.3390/genes12121947. Genes (Basel). 2021. PMID: 34946896 Free PMC article.
-
Statistical Approach of Gene Set Analysis with Quantitative Trait Loci for Crop Gene Expression Studies.Entropy (Basel). 2021 Jul 23;23(8):945. doi: 10.3390/e23080945. Entropy (Basel). 2021. PMID: 34441085 Free PMC article.
MeSH terms
LinkOut - more resources
Full Text Sources
Other Literature Sources
Medical