Missing value estimation methods for DNA microarrays
- PMID: 11395428
- DOI: 10.1093/bioinformatics/17.6.520
Missing value estimation methods for DNA microarrays
Abstract
Motivation: Gene expression microarray experiments can generate data sets with multiple missing expression values. Unfortunately, many algorithms for gene expression analysis require a complete matrix of gene array values as input. For example, methods such as hierarchical clustering and K-means clustering are not robust to missing data, and may lose effectiveness even with a few missing values. Methods for imputing missing data are needed, therefore, to minimize the effect of incomplete data sets on analyses, and to increase the range of data sets to which these algorithms can be applied. In this report, we investigate automated methods for estimating missing data.
Results: We present a comparative study of several methods for the estimation of missing values in gene microarray data. We implemented and evaluated three methods: a Singular Value Decomposition (SVD) based method (SVDimpute), weighted K-nearest neighbors (KNNimpute), and row average. We evaluated the methods using a variety of parameter settings and over different real data sets, and assessed the robustness of the imputation methods to the amount of missing data over the range of 1--20% missing values. We show that KNNimpute appears to provide a more robust and sensitive method for missing value estimation than SVDimpute, and both SVDimpute and KNNimpute surpass the commonly used row average method (as well as filling missing values with zeros). We report results of the comparative experiments and provide recommendations and tools for accurate estimation of missing microarray data under a variety of conditions.
Similar articles
-
The influence of missing value imputation on detection of differentially expressed genes from microarray data.Bioinformatics. 2005 Dec 1;21(23):4272-9. doi: 10.1093/bioinformatics/bti708. Epub 2005 Oct 10. Bioinformatics. 2005. PMID: 16216830
-
LSimpute: accurate estimation of missing values in microarray data with least squares methods.Nucleic Acids Res. 2004 Feb 20;32(3):e34. doi: 10.1093/nar/gnh026. Nucleic Acids Res. 2004. PMID: 14978222 Free PMC article.
-
Towards clustering of incomplete microarray data without the use of imputation.Bioinformatics. 2007 Jan 1;23(1):107-13. doi: 10.1093/bioinformatics/btl555. Epub 2006 Oct 31. Bioinformatics. 2007. PMID: 17077099
-
Two-pass imputation algorithm for missing value estimation in gene expression time series.J Bioinform Comput Biol. 2007 Oct;5(5):1005-22. doi: 10.1142/s0219720007003053. J Bioinform Comput Biol. 2007. PMID: 17933008
-
Dealing with missing values in large-scale studies: microarray data imputation and beyond.Brief Bioinform. 2010 Mar;11(2):253-64. doi: 10.1093/bib/bbp059. Epub 2009 Dec 4. Brief Bioinform. 2010. PMID: 19965979 Review.
Cited by
-
Integrating multimodal data through interpretable heterogeneous ensembles.Bioinform Adv. 2022 Sep 12;2(1):vbac065. doi: 10.1093/bioadv/vbac065. eCollection 2022. Bioinform Adv. 2022. PMID: 36158455 Free PMC article.
-
Dynamic clustering of gene expression.ISRN Bioinform. 2012 Oct 16;2012:537217. doi: 10.5402/2012/537217. eCollection 2012. ISRN Bioinform. 2012. PMID: 25969750 Free PMC article.
-
Faster permutation inference in brain imaging.Neuroimage. 2016 Nov 1;141:502-516. doi: 10.1016/j.neuroimage.2016.05.068. Epub 2016 Jun 7. Neuroimage. 2016. PMID: 27288322 Free PMC article.
-
Machine learning identifies a profile of inadequate responder to methotrexate in rheumatoid arthritis.Rheumatology (Oxford). 2023 Jul 5;62(7):2402-2409. doi: 10.1093/rheumatology/keac645. Rheumatology (Oxford). 2023. PMID: 36416134 Free PMC article.
-
A High-Fat Diet Disrupts Nerve Lipids and Mitochondrial Function in Murine Models of Neuropathy.Front Physiol. 2022 Aug 22;13:921942. doi: 10.3389/fphys.2022.921942. eCollection 2022. Front Physiol. 2022. PMID: 36072849 Free PMC article.
Publication types
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources
Molecular Biology Databases