A primer on gene expression and microarrays for machine learning researchers
- PMID: 15465482
- DOI: 10.1016/j.jbi.2004.07.002
A primer on gene expression and microarrays for machine learning researchers
Abstract
Data originating from biomedical experiments has provided machine learning researchers with an important source of motivation for developing and evaluating new algorithms. A new wave of algorithmic development has been initiated with the publication of gene expression data derived from microarrays. Microarray data analysis is particularly challenging given the large number of measurements (typically in the order of thousands) that are reported for relatively few samples (typically in the order of dozens). Many data sets are now available on the web. It is important that machine learning researchers understand how data are obtained and which assumptions are necessary in the analysis. Microarray data have the potential to cause significant impact in machine learning research, not just as a rich and realistic source of cases for testing new algorithms, as has been the UCI machine learning repository in the past decades, but also as a main motivation for their development. In this article, we briefly review the biology underlying microarrays, the process of obtaining gene expression measurements, and the rationale behind the common types of analyses involved in a microarray experiment. We outline the main challenges and reiterate critical considerations regarding the construction of supervised learning models that use this type of data. The goal of this article is to familiarize machine learning researchers with data originated from gene expression microarrays.
Similar articles
-
Tumor classification ranking from microarray data.BMC Genomics. 2008 Sep 16;9 Suppl 2(Suppl 2):S21. doi: 10.1186/1471-2164-9-S2-S21. BMC Genomics. 2008. PMID: 18831787 Free PMC article.
-
A review of feature selection techniques in bioinformatics.Bioinformatics. 2007 Oct 1;23(19):2507-17. doi: 10.1093/bioinformatics/btm344. Epub 2007 Aug 24. Bioinformatics. 2007. PMID: 17720704 Review.
-
Identifying projected clusters from gene expression profiles.J Biomed Inform. 2004 Oct;37(5):345-57. doi: 10.1016/j.jbi.2004.05.002. J Biomed Inform. 2004. PMID: 15488748
-
Considerations when using the significance analysis of microarrays (SAM) algorithm.BMC Bioinformatics. 2005 May 29;6:129. doi: 10.1186/1471-2105-6-129. BMC Bioinformatics. 2005. PMID: 15921534 Free PMC article.
-
Microarray data analysis: from disarray to consolidation and consensus.Nat Rev Genet. 2006 Jan;7(1):55-65. doi: 10.1038/nrg1749. Nat Rev Genet. 2006. PMID: 16369572 Review.
Cited by
-
A Python Clustering Analysis Protocol of Genes Expression Data Sets.Genes (Basel). 2022 Oct 12;13(10):1839. doi: 10.3390/genes13101839. Genes (Basel). 2022. PMID: 36292724 Free PMC article.
-
Ethanol sensitivity: a central role for CREB transcription regulation in the cerebellum.BMC Genomics. 2006 Dec 5;7:308. doi: 10.1186/1471-2164-7-308. BMC Genomics. 2006. PMID: 17147806 Free PMC article.
-
A Marfan syndrome gene expression phenotype in cultured skin fibroblasts.BMC Genomics. 2007 Sep 12;8:319. doi: 10.1186/1471-2164-8-319. BMC Genomics. 2007. PMID: 17850668 Free PMC article.
-
Ten simple rules for organizing a special session at a scientific conference.PLoS Comput Biol. 2022 Aug 25;18(8):e1010395. doi: 10.1371/journal.pcbi.1010395. eCollection 2022 Aug. PLoS Comput Biol. 2022. PMID: 36006874 Free PMC article.
-
Metabolic engineering in the -omics era: elucidating and modulating regulatory networks.Microbiol Mol Biol Rev. 2005 Jun;69(2):197-216. doi: 10.1128/MMBR.69.2.197-216.2005. Microbiol Mol Biol Rev. 2005. PMID: 15944454 Free PMC article. Review.
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Miscellaneous