Pathway analysis using random forests classification and regression
- PMID: 16809386
- DOI: 10.1093/bioinformatics/btl344
Pathway analysis using random forests classification and regression
Abstract
Motivation: Although numerous methods have been developed to better capture biological information from microarray data, commonly used single gene-based methods neglect interactions among genes and leave room for other novel approaches. For example, most classification and regression methods for microarray data are based on the whole set of genes and have not made use of pathway information. Pathway-based analysis in microarray studies may lead to more informative and relevant knowledge for biological researchers.
Results: In this paper, we describe a pathway-based classification and regression method using Random Forests to analyze gene expression data. The proposed methods allow researchers to rank important pathways from externally available databases, discover important genes, find pathway-based outlying cases and make full use of a continuous outcome variable in the regression setting. We also compared Random Forests with other machine learning methods using several datasets and found that Random Forests classification error rates were either the lowest or the second-lowest. By combining pathway information and novel statistical methods, this procedure represents a promising computational strategy in dissecting pathways and can provide biological insight into the study of microarray data.
Availability: Source code written in R is available from http://bioinformatics.med.yale.edu/pathway-analysis/rf.htm.
Similar articles
-
Robust classification modeling on microarray data using misclassification penalized posterior.Bioinformatics. 2005 Jun;21 Suppl 1:i423-30. doi: 10.1093/bioinformatics/bti1020. Bioinformatics. 2005. PMID: 15961487
-
Pathway recognition and augmentation by computational analysis of microarray expression data.Bioinformatics. 2006 Jan 15;22(2):233-41. doi: 10.1093/bioinformatics/bti764. Epub 2005 Nov 8. Bioinformatics. 2006. PMID: 16278238
-
Methods for labeling error detection in microarrays based on the effect of data perturbation on the regression model.Bioinformatics. 2009 Oct 15;25(20):2708-14. doi: 10.1093/bioinformatics/btp478. Epub 2009 Aug 6. Bioinformatics. 2009. PMID: 19661242
-
Classification based upon gene expression data: bias and precision of error rates.Bioinformatics. 2007 Jun 1;23(11):1363-70. doi: 10.1093/bioinformatics/btm117. Epub 2007 Mar 28. Bioinformatics. 2007. PMID: 17392326 Review.
-
Bioinformatics analysis of microarray data.Methods Mol Biol. 2009;573:259-84. doi: 10.1007/978-1-60761-247-6_15. Methods Mol Biol. 2009. PMID: 19763933 Review.
Cited by
-
Radiomics analysis using stability selection supervised component analysis for right-censored survival data.Comput Biol Med. 2020 Sep;124:103959. doi: 10.1016/j.compbiomed.2020.103959. Epub 2020 Aug 6. Comput Biol Med. 2020. PMID: 32905923 Free PMC article.
-
Data mining in the Life Sciences with Random Forest: a walk in the park or lost in the jungle?Brief Bioinform. 2013 May;14(3):315-26. doi: 10.1093/bib/bbs034. Epub 2012 Jul 10. Brief Bioinform. 2013. PMID: 22786785 Free PMC article.
-
MAVTgsa: an R package for gene set (enrichment) analysis.Biomed Res Int. 2014;2014:346074. doi: 10.1155/2014/346074. Epub 2014 Jul 3. Biomed Res Int. 2014. PMID: 25101274 Free PMC article.
-
A two-stage random forest-based pathway analysis method.PLoS One. 2012;7(5):e36662. doi: 10.1371/journal.pone.0036662. Epub 2012 May 7. PLoS One. 2012. PMID: 22586488 Free PMC article.
-
CNNArginineMe: A CNN structure for training models for predicting arginine methylation sites based on the One-Hot encoding of peptide sequence.Front Genet. 2022 Oct 17;13:1036862. doi: 10.3389/fgene.2022.1036862. eCollection 2022. Front Genet. 2022. PMID: 36324513 Free PMC article.
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Other Literature Sources