Skip to content

DataSlingers/clustRviz

 
 

Repository files navigation

---
output: github_document
always_allow_html: true
---

<!-- README.md is generated from README.Rmd. Please edit that file -->

```{r, echo = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-"
)
```
[![GitHub Actions Build Status](https://github.com/DataSlingers/clustRviz/workflows/R-CMD-check and Deploy/badge.svg)](https://github.com/DataSlingers/clustRviz/actions?query=workflow%3A%22R-CMD-check+and+Deploy%22)
[![codecov Coverage Status](https://codecov.io/gh/DataSlingers/clustRviz/branch/develop/graph/badge.svg)](https://codecov.io/gh/DataSlingers/clustRviz/branch/develop)
[![License: GPL v3](https://img.shields.io/badge/License-GPL%20v3-blue.svg)](https://www.gnu.org/licenses/gpl-3.0)
[![CRAN\_Status\_Badge](http://www.r-pkg.org/badges/version/clustRviz)](https://cran.r-project.org/package=clustRviz)
[![Project Status: Active – The project has reached a stable, usable state and is being actively developed.](http://www.repostatus.org/badges/latest/active.svg)](http://www.repostatus.org/#active)

# clustRviz

`clustRviz` aims to enable fast computation and easy visualization of Convex Clustering
solution paths.

## Installation

You can install `clustRviz` from github with:

```{r gh-installation, eval = FALSE}
# install.packages("devtools")
devtools::install_github("DataSlingers/clustRviz")
```

Note that `RcppEigen` (which `clustRviz` internally) triggers many compiler warnings
(which cannot be suppressed per
[CRAN policies](http://cran.r-project.org/web/packages/policies.html#Source-packages)).
Many of these warnings can be locally suppressed by adding the line `CXX11FLAGS+=-Wno-ignored-attributes`
to your `~/.R/Makevars` file. To install an `R` package from source, you will need
suitable development tools installed including a `C++` compiler and potentially
a Fortran runtime. Details about these toolchains are available on CRAN for
[Windows](https://cran.r-project.org/bin/windows/Rtools/) and [macOS](https://mac.r-project.org/tools/).

## Quick-Start

There are two main entry points to the `clustRviz` package, the `CARP` and `CBASS`
functions, which perform convex clustering and convex biclustering respectively.
We demonstrate the use of these two functions on a text minining data set,
`presidential_speech`, which measures how often the 44 U.S. presidents used certain
words in their public addresses.

```{r load_data}
library(clustRviz)
data(presidential_speech)
presidential_speech[1:6, 1:6]
```

### Clustering

We begin by clustering this data set, grouping the rows (presidents) into clusters:

```{r carp_example}
carp_fit <- CARP(presidential_speech)
print(carp_fit)
```

The algorithmic regularization technique employed by `CARP` makes computation of
the whole solution path almost immediate.

We can examine the result of `CARP` graphically. We begin with a standard dendrogram,
with three clusters highlighted:

```{r carp_dendro}
plot(carp_fit, type = "dendrogram", k = 3)
```

Examing the dendrogram, we see two clear clusters, consisting of pre-WWII and post-WWII
presidents and Warren G. Harding as a possible outlier. Harding is generally considered
one of the worst US presidents of all time, so this is perhaps not too surprising.

A more interesting visualization is the dynamic path visualization, whereby we can
watch the clusters fuse as the regularization level is increased:

```{r carp_dynamic, eval = FALSE}
plot(carp_fit, type = "path", dynamic = TRUE)
```

### BiClustering

The use of `CBASS` for convex biclustering is similar, and we demonstrate it here
with a cluster heatmap, with the regularization set to give 3 observation clusters:

```{r cbass}
cbass_fit <- CBASS(presidential_speech)
plot(cbass_fit, k.row = 3)
```

By default, plotting the result of CBASS gives the traditional cluster heatmap,
but we can also get the row or column dendrograms as well:

```{r cbass_rowdendro}
plot(cbass_fit, type = "row.dendrogram", k.row = 3)
```

By default, if a regularization level is specified, all plotting functions in `clustRviz`
will plot the clustered data. If the regularization level is not specified, the
raw data will be plotted instead:

```{r cbass_heatmap2}
plot(cbass_fit, type = "heatmap")
```

More details about the use and mathematical formulation of `CARP` and `CBASS`
may be found in the package documentation.