Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 Jul 2:11:367.
doi: 10.1186/1471-2105-11-367.

A flexible R package for nonnegative matrix factorization

Affiliations

A flexible R package for nonnegative matrix factorization

Renaud Gaujoux et al. BMC Bioinformatics. .

Abstract

Background: Nonnegative Matrix Factorization (NMF) is an unsupervised learning technique that has been applied successfully in several fields, including signal processing, face recognition and text mining. Recent applications of NMF in bioinformatics have demonstrated its ability to extract meaningful information from high-dimensional data such as gene expression microarrays. Developments in NMF theory and applications have resulted in a variety of algorithms and methods. However, most NMF implementations have been on commercial platforms, while those that are freely available typically require programming skills. This limits their use by the wider research community.

Results: Our objective is to provide the bioinformatics community with an open-source, easy-to-use and unified interface to standard NMF algorithms, as well as with a simple framework to help implement and test new NMF methods. For that purpose, we have developed a package for the R/BioConductor platform. The package ports public code to R, and is structured to enable users to easily modify and/or add algorithms. It includes a number of published NMF algorithms and initialization methods and facilitates the combination of these to produce new NMF strategies. Commonly used benchmark data and visualization methods are provided to help in the comparison and interpretation of the results.

Conclusions: The NMF package helps realize the potential of Nonnegative Matrix Factorization, especially in bioinformatics, providing easy access to methods that have already yielded new insights in many applications. Documentation, source code and sample data are available from CRAN.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Cophenetic correlation coefficient. Each point on the graph was obtained from 50 runs of the Brunet et al's algorithm [4]. This graph indicates the robustness of the clusters for different values of the factorization rank. There is a large decrease in the stability for r = 5, compared to lower ranks.
Figure 2
Figure 2
Heatmap of the metagene expression profiles matrix. The metagene expression profile matrix was obtained from the factorization that achieved the lowest approximation error across 200 random runs of the Brunet et al.'s algorithm on the Golub dataset. Each column corresponds to a samples. The top colored row shows the phenotypic class to which each sample belongs. Columns were scaled to sum to one and ordered by clusters, which are highlighted on the second row by colours that map them with their associated metagene.
Figure 3
Figure 3
Heatmap of the metagene matrix. The metagene matrix was obtained from the same factorization used in Figure 2. Each row corresponds to a gene. The most metagene-specific genes were selected using the Kim and Park's scoring and filtering method. This resulted in the selection of 635 genes. Rows were scaled to sum to one and ordered by hierarchical clustering based on the euclidean distance and average linkage.
Figure 4
Figure 4
Consensus matrix. The consensus matrix was obtained from 200 random runs of the Brunet et al.'s algorithm on the Golub dataset. Values range from 0 to 1. Columns - and rows - were ordered by hierarchical clustering based on the euclidean distance with average linkage.
Figure 5
Figure 5
Plot of the residual approximation error. Each curve reports the trajectory of the approximation residuals, computed with the algorithm's loss function. Each track is normalized separately over its maximum value, and stops at the number of iterations required to achieve the convergence criterion.

Similar articles

Cited by

References

    1. Paatero P, Tapper U. Positive matrix factorization: A non-negative factor model with optimal utilization of error estimates of data values. Environmetrics. 1994;5(2):111–126. doi: 10.1002/env.3170050203. http://dx.doi.org/10.1002/env.3170050203 - DOI - DOI
    1. Lee D, Seung H. Learning the parts of objects by non-negative matrix factorization. Nature. 1999;401:788–791. doi: 10.1038/44565. http://www.nature.com/nature/journal/v401/n6755/abs/401788a0.html - DOI - PubMed
    1. Devarajan K. Nonnegative matrix factorization: an analytical and interpretive tool in computational biology. PLoS computational biology. 2008;4:e1000029. doi: 10.1371/journal.pcbi.1000029. http://www.ncbi.nlm.nih.gov/pubmed/18654623 - DOI - PMC - PubMed
    1. Brunet JP, Tamayo P, Golub TR, Mesirov JP. Metagenes and molecular pattern discovery using matrix factorization. Proceedings of the National Academy of Sciences of the United States of America. 2004;101:4164–9. doi: 10.1073/pnas.0308531101. http://www.ncbi.nlm.nih.gov/pubmed/15016911 - DOI - PMC - PubMed
    1. Pehkonen P, Wong G, Toronen P. Theme discovery from gene lists for identification and viewing of multiple functional groups. BMC Bioinformatics. 2005;6:162. doi: 10.1186/1471-2105-6-162. http://www.biomedcentral.com/1471-2105/6/162 - DOI - PMC - PubMed

Publication types

MeSH terms

LinkOut - more resources