-
Notifications
You must be signed in to change notification settings - Fork 5
/
Copy pathgetClassAUC.Rd
68 lines (63 loc) · 2.53 KB
/
getClassAUC.Rd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/getClassAUC.R
\name{getClassAUC}
\alias{getClassAUC}
\title{getClassAUC}
\usage{
getClassAUC(gs, markers = NULL, plotCurves = TRUE, colors = NULL)
}
\arguments{
\item{gs}{A list containing \code{$specScore} sparse matrix. Typically the
output of \code{sortGenes()}.}
\item{markers}{A character vector of gene names to restrict this analysis to.
See Details.}
\item{plotCurves}{Should a plot be drawn? default value is TRUE.}
\item{colors}{Color palette for the plot.}
}
\value{
\code{getClassAUC} returns a numeric vector of length
\code{ncol($specScore)} that contains the AUC for each cell cluster.
}
\description{
getClassAUC implements one way to investigate clustering quality. It processes the
output of \code{sortGenes} to obtain a curve for each cell cluster for all
gene specificity scores against their ranking in the cluster. The Area Under
the Curve (AUC) can be used as a measure of clustering quality in terms of the
possibility to identify cell clusters using a few marker genes. See Details.
}
\details{
Given the specificity score for all genes in a certain cell cluster, we can
assume that a well-separated easily-identified cell cluster will have a
relatively small number of genes that have a very high specificity score. Top
marker genes for a cluster that is poorly separated from other cell
clusters will have average or low specificity scores. Sorting the genes for
each cell cluster by their specificity scores and plotting the scaled scores
in order creates a curve that should be far from the diagonal for
well-separated clusters but close to the diagonal for poorly-separated
clusters. The AUC of this curve can be used to quantify this intuition and
estimate a clustering quality metric.
}
\examples{
#randomly generated expression matrix and cell clusters
set.seed(1234)
exp = matrix(sample(0:20,1000,replace=TRUE), ncol = 20)
rownames(exp) = sapply(1:50, function(x) paste0("g", x))
cellType = sample(c("cell type 1","cell type 2"),20,replace=TRUE)
sg = sortGenes(exp, cellType)
classAUC = getClassAUC(sg)
#"reasonably" separated clusters
data(sim)
sg = sortGenes(sim$exp, sim$cellType)
classAUC = getClassAUC(sg)
#real data with three well separated clusters
data(kidneyTabulaMuris)
sg = sortGenes(kidneyTabulaMuris$exp, kidneyTabulaMuris$cellType)
classAUC = getClassAUC(sg)
}
\seealso{
\code{getMarkers} returns a cell cluster Shannon index that tends to
correlate well with the AUC metric returned by \code{getClassAUC}.
}
\author{
Mahmoud M Ibrahim <mmibrahim@pm.me>
}