-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement Rank/Select #14
Comments
I have added the building blocks for succinct rank/select data structures Lines 8 to 26 in 65770f0
uses the BMI2 instruction set on modern processors for uint64 rank/select. From here we can implement e.g. the poppy paper above, or a different approach. |
Alex Bowe has some amazing material from this domain; we should check out his writing
https://raw.githubusercontent.com/alexbowe/wavelet-paper/thesis/thesis.pdf |
See also this paper describing https://github.com/ot/succinct Also this thesis going with it. |
This paper https://arxiv.org/abs/1706.00990 describes the machine word select we are using for succinct data structures Lines 27 to 36 in 893de95
|
The paper http://www.cs.cmu.edu/~dga/papers/zhou-sea2013.pdf - termed "poppy" - from
sounds very practical keeping in mind the realities of hardware architectures and cache hierarchies; they
and more importantly they also provide a simple basic rank sketch which is simple to implement, has 12.5% space overhead, and allows us then to iterate easily in the future. The first iteration builds up a single index; the second iteration improves on this to get down to 3.2% space overhead for rank. Their main insights (for rank, since that's what I have read so far) are as follows
This means we create an index and store one 64bit uint for every 64 bytes (= 512 bits) in the original bit vector. During rank query time we use the pre-computed index, plus the remaining bits in the last basic block. |
Adding here We already provide the machine word popcount functionality in the bits module Lines 16 to 25 in 893de95
and memory alignment to a 64 byte cache line we can simply do with |
Since opening this issue there has been research to improve on the cs-poppy solution we wanted to start with here. An improvement on poppy, called "pasta" (see "pasta-flat" in the paper): Engineering Compact Data Structures for Rank and Select Queries on Bit Vectors
An improvement on poppy and pasta altogether: SPIDER: Improved Succinct Rank and Select Performance
and a few tricks |
I have a first version ready in #41 |
We should look into implementing rank/select structures for bitset / elias-fano #13
For example have a look at
which seems to be simple and practical.
cc @ucyo
The text was updated successfully, but these errors were encountered: