Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds a rank-select data structure, see #14 #41

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Conversation

daniel-j-h
Copy link
Member

Simple rank-select data structure for ranking (sum of 1 bits) and selecting (position of ith 1).

We will need this for an even tinier graph representation that

  1. Stores the edge ranges as delta-vbyte encoded lists, and
  2. Uses a rank-select enabled bitset to indicate the start of a new edge range
2 3 4 2 2 3 4 5 6 0 1  <- sorted mon. increasing edge targets
|     | |         | 
1 0 0 1 1 0 0 0 0 1 0  <- rank-select bitvector indicating when an edge range starts

for graph traversal we can then get edges as in [select(start_vertex), select(start_vertex + 1)] to then first decode the vbytes and then delta-decode it all in one go. I'm interested in how well this will actually work in practice nowadays.

The big downside at the moment is we make heavy use of the popcnt instruction that's only available in Haswell x86 processors. For arm we don't have a reasonable implementation at the moment and also no access to machines to test and benchmark.

See the comments inline for more context and implementation details.

Comment on lines +11 to +60
// A succinct rank-select datastructure over a bitset.
//
// rank1(bits, n): sum of 1s in bits before position n
//
// select1(bits, n): position of nth 1 bit set in bits
//
// We implement a very simple rank-select datastructure
// with room for improvement in terms of space and runtime
// but the benefit is a relatively simple implementation.
//
// For rank: We make use of recent learnings from [1] and
// [3]: We use basic blocks of 512 bits (cache line size)
// and within these basic blocks make use of repeated
// popcnt instructions. This is similar to rank9 from [4]
// but without the 9-bit second level index and with the
// popcnt hardware instruction instead of broadword tricks.
// With a cache line aligned bitvector we should see a max
// of two cache misses: one for the basic block and then one
// for the remainder in the bitvector itself. Potential
// for improvements are listed in [1] and [2] e.g. the
// use of multi level rank inventories.
//
// For select: We make use of recent learnings from [1] and
// [3] where we sample every e.g. 8192'th bit set and from
// there use a scan over the rank inventory to speed up
// the search. The final cache line select works using
// the recent pdep hardware instruction. Note that worst
// case runtime is in O(n) for adversarily bitvector cases
// such as: a very large bit vector but only one bit set at
// the very start and one bit set at the very end.
// Potential for improvements are listed in [1] and [2].
// In addition one idea could be tuning that 8192 constant
// e.g. based on the bitvector's load factor (how many bits
// are set) to make sure scanning ranks stays reasonable.
// The simple-select datastructure in [3] is especially
// tuned for select (does not rank) and seems to be strong
// in runtime but also needs more space.
//
//
// [1] SPIDER: Improved Succinct Rank and Select Performance
// https://arxiv.org/abs/2405.05214
//
// [2] Engineering Compact Data Structures for Rank and Select Queries on Bit Vectors
// https://arxiv.org/abs/2206.01149
//
// [3] A Fast x86 Implementation of Select
// https://arxiv.org/abs/1706.00990
//
// [4] Broadword Implementation of Rank/Select Queries
// https://vigna.di.unimi.it/ftp/papers/Broadword.pdf
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Impl. details and context

Comment on lines +217 to +220
// We can compute rank from two parts: the pre-computed
// ranks for each cache line of size 512, and in case
// we rank inside a cache line adding the remaining
// residual from ranking the bits inside the cache line.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rank

Comment on lines +247 to +261
// To select the nth set bit on the bitset
//
// 1. Use the samples array to find two positions
// where the nth bit has to be within. This range
// in general is unrestricted and in adversarial
// cases the range might span the whole bitset.
// That is why select is in O(n) but fast in
// practice and works fine for our use cases.
//
// 2. Use the ranks array to search for a block in
// the range where the nth bit must occur.
//
// 3. Within the 512 bit block where the nth bit
// must occur, rely on our highly tuned
// select512 implementations.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

select

Comment on lines +120 to +126
// Our data structure can be constructed in a single linear scan
// over the bitvector: for every block of 512 bits we store the
// computed rank up to this position in the ranks array. And for
// every 8192th bit set we store its position in the samples array.
//
// By iterating one cache line at a time we can make use of our
// highly tuned rank512 and select512 implementations.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

construction

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

1 participant