Pathway analysis using random forests classification and regression

doi:10.1093/bioinformatics/btl344

. 2006 Aug 15;22(16):2028-36.

doi: 10.1093/bioinformatics/btl344. Epub 2006 Jun 29.

Pathway analysis using random forests classification and regression

Herbert Pang¹, Aiping Lin, Matthew Holford, Bradley E Enerson, Bin Lu, Michael P Lawton, Eugenia Floyd, Hongyu Zhao

Affiliations

PMID: 16809386
DOI: 10.1093/bioinformatics/btl344

Pathway analysis using random forests classification and regression

Herbert Pang et al. Bioinformatics. 2006.

. 2006 Aug 15;22(16):2028-36.

doi: 10.1093/bioinformatics/btl344. Epub 2006 Jun 29.

Authors

Herbert Pang¹, Aiping Lin, Matthew Holford, Bradley E Enerson, Bin Lu, Michael P Lawton, Eugenia Floyd, Hongyu Zhao

Affiliation

¹ Division of Biostatistics, Department of Epidemiology and Public Health, Yale University School of Medicine New Haven, CT 06520, USA.

PMID: 16809386
DOI: 10.1093/bioinformatics/btl344

Abstract

Motivation: Although numerous methods have been developed to better capture biological information from microarray data, commonly used single gene-based methods neglect interactions among genes and leave room for other novel approaches. For example, most classification and regression methods for microarray data are based on the whole set of genes and have not made use of pathway information. Pathway-based analysis in microarray studies may lead to more informative and relevant knowledge for biological researchers.

Results: In this paper, we describe a pathway-based classification and regression method using Random Forests to analyze gene expression data. The proposed methods allow researchers to rank important pathways from externally available databases, discover important genes, find pathway-based outlying cases and make full use of a continuous outcome variable in the regression setting. We also compared Random Forests with other machine learning methods using several datasets and found that Random Forests classification error rates were either the lowest or the second-lowest. By combining pathway information and novel statistical methods, this procedure represents a promising computational strategy in dissecting pathways and can provide biological insight into the study of microarray data.

Availability: Source code written in R is available from http://bioinformatics.med.yale.edu/pathway-analysis/rf.htm.

PubMed Disclaimer

Cited by

Radiomics analysis using stability selection supervised component analysis for right-censored survival data.
Yan KK, Wang X, Lam WWT, Vardhanabhuti V, Lee AWM, Pang HH. Yan KK, et al. Comput Biol Med. 2020 Sep;124:103959. doi: 10.1016/j.compbiomed.2020.103959. Epub 2020 Aug 6. Comput Biol Med. 2020. PMID: 32905923 Free PMC article.
Data mining in the Life Sciences with Random Forest: a walk in the park or lost in the jungle?
Touw WG, Bayjanov JR, Overmars L, Backus L, Boekhorst J, Wels M, van Hijum SA. Touw WG, et al. Brief Bioinform. 2013 May;14(3):315-26. doi: 10.1093/bib/bbs034. Epub 2012 Jul 10. Brief Bioinform. 2013. PMID: 22786785 Free PMC article.
MAVTgsa: an R package for gene set (enrichment) analysis.
Chien CY, Chang CW, Tsai CA, Chen JJ. Chien CY, et al. Biomed Res Int. 2014;2014:346074. doi: 10.1155/2014/346074. Epub 2014 Jul 3. Biomed Res Int. 2014. PMID: 25101274 Free PMC article.
A two-stage random forest-based pathway analysis method.
Chung RH, Chen YE. Chung RH, et al. PLoS One. 2012;7(5):e36662. doi: 10.1371/journal.pone.0036662. Epub 2012 May 7. PLoS One. 2012. PMID: 22586488 Free PMC article.
CNNArginineMe: A CNN structure for training models for predicting arginine methylation sites based on the One-Hot encoding of peptide sequence.
Zhao J, Jiang H, Zou G, Lin Q, Wang Q, Liu J, Ma L. Zhao J, et al. Front Genet. 2022 Oct 17;13:1036862. doi: 10.3389/fgene.2022.1036862. eCollection 2022. Front Genet. 2022. PMID: 36324513 Free PMC article.

See all "Cited by" articles

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
- Ovid Technologies, Inc.
- Silverchair Information Systems
Other Literature Sources
- The Lens - Patent Citations Database

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Pathway analysis using random forests classification and regression

Affiliation

Pathway analysis using random forests classification and regression

Authors

Affiliation

Abstract

Similar articles

Cited by

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources

Other Literature Sources