Skip to content

Add n_unique() = length(unique(x)) - useful while grouping #884

Closed
@arunsrinivasan

Description

The SO question that triggered the thought:

Instead of having to do:

DT[, length(unique(.)), by=.]

We could do with:

DT[, n_unique(.), by=.]

This'll especially be faster for data.tables though because we don't have to subset the entire data.table to know the number of unique values.

Here's a quick benchmark:

require(data.table)
x = sample(1e2, 1e7, TRUE)
system.time(ans1 <- length(unique(x))) # 0.667 seconds
system.time(ans2 <- length(attr(data.table:::forderv(x, retGrp=TRUE), 'starts'))) # 0.1 seconds

We could, in addition, also internally optimise length(unique(.)) to n_unique(.).

Metadata

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions