Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Rank/Select #14

Open
daniel-j-h opened this issue Jul 21, 2021 · 8 comments
Open

Implement Rank/Select #14

daniel-j-h opened this issue Jul 21, 2021 · 8 comments

Comments

@daniel-j-h
Copy link
Member

daniel-j-h commented Jul 21, 2021

We should look into implementing rank/select structures for bitset / elias-fano #13

For example have a look at

"Space-efficient, high-performance rank and select structures on uncompressed bit sequences" by Zhou, Andersen, and Kaminsky

which seems to be simple and practical.

cc @ucyo

@daniel-j-h
Copy link
Member Author

I have added the building blocks for succinct rank/select data structures

/*
* Efficient bits and broadword manipulation.
*
* We need this for our succinct rank/select
* implementations, mostly.
*/
TINYGRAPH_WARN_UNUSED
uint32_t tinygraph_bits_count(uint64_t v);
TINYGRAPH_WARN_UNUSED
uint32_t tinygraph_bits_find(uint64_t v, uint32_t n);
TINYGRAPH_WARN_UNUSED
uint32_t tinygraph_bits_leading0(uint64_t v);
TINYGRAPH_WARN_UNUSED
uint32_t tinygraph_bits_trailing0(uint64_t v);

uses the BMI2 instruction set on modern processors for uint64 rank/select.

From here we can implement e.g. the poppy paper above, or a different approach.

@daniel-j-h
Copy link
Member Author

@daniel-j-h
Copy link
Member Author

daniel-j-h commented Jul 29, 2021

See also this paper describing https://github.com/ot/succinct

Also this thesis going with it.

@daniel-j-h
Copy link
Member Author

This paper https://arxiv.org/abs/1706.00990 describes the machine word select we are using for succinct data structures

#ifdef TINYGRAPH_HAS_BMI2
uint32_t tinygraph_bits_select(uint64_t v, uint32_t n) {
TINYGRAPH_STATIC_ASSERT(sizeof(uint64_t) == sizeof(unsigned long long));
TINYGRAPH_ASSERT(n < tinygraph_bits_count(v));
return tinygraph_bits_trailing0_u64(_pdep_u64(UINT64_C(1) << n, v));
}
#else // TINYGRAPH_HAS_BMI2

@daniel-j-h
Copy link
Member Author

The paper http://www.cs.cmu.edu/~dga/papers/zhou-sea2013.pdf - termed "poppy" - from

"Space-efficient, high-performance rank and select structures on uncompressed bit sequences" by Zhou, Andersen, and Kaminsky

sounds very practical keeping in mind the realities of hardware architectures and cache hierarchies; they

  • get down to 3.2% space overhead for rank
  • get down to 0.39% space overhead for select

and more importantly they also provide a simple basic rank sketch which is simple to implement, has 12.5% space overhead, and allows us then to iterate easily in the future. The first iteration builds up a single index; the second iteration improves on this to get down to 3.2% space overhead for rank.

Their main insights (for rank, since that's what I have read so far) are as follows

  • cache misses are way more important than optimizing for instructions
  • a processor cache line is 64 bytes nowadays; use 64 byte "basic blocks"
  • the popcount instruction is the fastest way to count bits in these basic blocks
  • build up an index where we store the cumulative sums for each basic block

This means we create an index and store one 64bit uint for every 64 bytes (= 512 bits) in the original bit vector.

During rank query time we use the pre-computed index, plus the remaining bits in the last basic block.

@daniel-j-h
Copy link
Member Author

Adding here

We already provide the machine word popcount functionality in the bits module

uint32_t tinygraph_bits_count(uint64_t v) {
TINYGRAPH_STATIC_ASSERT(sizeof(uint64_t) == sizeof(unsigned long long));
return __builtin_popcountll(v);
}
uint32_t tinygraph_bits_rank(uint64_t v, uint32_t n) {
TINYGRAPH_ASSERT(n <= 64);
return tinygraph_bits_count(v << (64 - n));
}

and memory alignment to a 64 byte cache line we can simply do with posix_memalign(&p, 64, size).

@daniel-j-h
Copy link
Member Author

Since opening this issue there has been research to improve on the cs-poppy solution we wanted to start with here.

An improvement on poppy, called "pasta" (see "pasta-flat" in the paper):

Engineering Compact Data Structures for Rank and Select Queries on Bit Vectors

In practice, the smallest (uncompressed) rank and select data structure cs-poppy
has a space overhead of ≈ 3.51 % [Zhou et al., SEA ‘13]. Using the same overhead, we present a
data structure that can answer queries up to 8 % (rank) and 16.5 % (select) faster compared with
cs-poppy.

screen grabs

pasta1
pasta2

An improvement on poppy and pasta altogether:

SPIDER: Improved Succinct Rank and Select Performance

However, rank and select query performance still incurs a tradeoff between query time
and space. For example, Vigna [ 27 ] gives a data structure for rank queries using 25% space
that is roughly 19% faster than pasta-flat, and a data structure for select queries using
12.2% space, which is roughly 65% faster than pasta-flat.

and a few tricks

screen grabs

spider1

@daniel-j-h
Copy link
Member Author

I have a first version ready in #41

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

1 participant