Add n_unique()
= length(unique(x)) - useful while grouping #884
Closed
Description
The SO question that triggered the thought:
Instead of having to do:
DT[, length(unique(.)), by=.]
We could do with:
DT[, n_unique(.), by=.]
This'll especially be faster for data.tables though because we don't have to subset the entire data.table to know the number of unique values.
Here's a quick benchmark:
require(data.table)
x = sample(1e2, 1e7, TRUE)
system.time(ans1 <- length(unique(x))) # 0.667 seconds
system.time(ans2 <- length(attr(data.table:::forderv(x, retGrp=TRUE), 'starts'))) # 0.1 seconds
We could, in addition, also internally optimise length(unique(.))
to n_unique(.)
.