Adds a rank-select data structure, see #14 #41

daniel-j-h · 2024-06-27T20:19:17Z

Simple rank-select data structure for ranking (sum of 1 bits) and selecting (position of ith 1).

We will need this for an even tinier graph representation that

Stores the edge ranges as delta-vbyte encoded lists, and
Uses a rank-select enabled bitset to indicate the start of a new edge range

2 3 4 2 2 3 4 5 6 0 1  <- sorted mon. increasing edge targets
|     | |         | 
1 0 0 1 1 0 0 0 0 1 0  <- rank-select bitvector indicating when an edge range starts

for graph traversal we can then get edges as in [select(start_vertex), select(start_vertex + 1)] to then first decode the vbytes and then delta-decode it all in one go. I'm interested in how well this will actually work in practice nowadays.

The big downside at the moment is we make heavy use of the popcnt instruction that's only available in Haswell x86 processors. For arm we don't have a reasonable implementation at the moment and also no access to machines to test and benchmark.

See the comments inline for more context and implementation details.

daniel-j-h · 2024-06-27T20:19:36Z

tinygraph/tinygraph-rs.c

+// A succinct rank-select datastructure over a bitset.
+//
+//   rank1(bits, n): sum of 1s in bits before position n
+//
+//   select1(bits, n): position of nth 1 bit set in bits
+//
+// We implement a very simple rank-select datastructure
+// with room for improvement in terms of space and runtime
+// but the benefit is a relatively simple implementation.
+//
+// For rank: We make use of recent learnings from [1] and
+// [3]: We use basic blocks of 512 bits (cache line size)
+// and within these basic blocks make use of repeated
+// popcnt instructions. This is similar to rank9 from [4]
+// but without the 9-bit second level index and with the
+// popcnt hardware instruction instead of broadword tricks.
+// With a cache line aligned bitvector we should see a max
+// of two cache misses: one for the basic block and then one
+// for the remainder in the bitvector itself. Potential
+// for improvements are listed in [1] and [2] e.g. the
+// use of multi level rank inventories.
+//
+// For select: We make use of recent learnings from [1] and
+// [3] where we sample every e.g. 8192'th bit set and from
+// there use a scan over the rank inventory to speed up
+// the search. The final cache line select works using
+// the recent pdep hardware instruction. Note that worst
+// case runtime is in O(n) for adversarily bitvector cases
+// such as: a very large bit vector but only one bit set at
+// the very start and one bit set at the very end.
+// Potential for improvements are listed in [1] and [2].
+// In addition one idea could be tuning that 8192 constant
+// e.g. based on the bitvector's load factor (how many bits
+// are set) to make sure scanning ranks stays reasonable.
+// The simple-select datastructure in [3] is especially
+// tuned for select (does not rank) and seems to be strong
+// in runtime but also needs more space.
+//
+//
+// [1] SPIDER: Improved Succinct Rank and Select Performance
+// https://arxiv.org/abs/2405.05214
+//
+// [2] Engineering Compact Data Structures for Rank and Select Queries on Bit Vectors
+// https://arxiv.org/abs/2206.01149
+//
+// [3] A Fast x86 Implementation of Select
+// https://arxiv.org/abs/1706.00990
+//
+// [4] Broadword Implementation of Rank/Select Queries
+// https://vigna.di.unimi.it/ftp/papers/Broadword.pdf


Impl. details and context

daniel-j-h · 2024-06-27T20:20:01Z

tinygraph/tinygraph-rs.c

+  // We can compute rank from two parts: the pre-computed
+  // ranks for each cache line of size 512, and in case
+  // we rank inside a cache line adding the remaining
+  // residual from ranking the bits inside the cache line.


daniel-j-h · 2024-06-27T20:20:09Z

tinygraph/tinygraph-rs.c

+  // To select the nth set bit on the bitset
+  //
+  //  1. Use the samples array to find two positions
+  //     where the nth bit has to be within. This range
+  //     in general is unrestricted and in adversarial
+  //     cases the range might span the whole bitset.
+  //     That is why select is in O(n) but fast in
+  //     practice and works fine for our use cases.
+  //
+  //  2. Use the ranks array to search for a block in
+  //     the range where the nth bit must occur.
+  //
+  //  3. Within the 512 bit block where the nth bit
+  //     must occur, rely on our highly tuned
+  //     select512 implementations.


daniel-j-h · 2024-06-27T20:20:23Z

tinygraph/tinygraph-rs.c

+  // Our data structure can be constructed in a single linear scan
+  // over the bitvector: for every block of 512 bits we store the
+  // computed rank up to this position in the ranks array. And for
+  // every 8192th bit set we store its position in the samples array.
+  //
+  // By iterating one cache line at a time we can make use of our
+  // highly tuned rank512 and select512 implementations.


construction

daniel-j-h commented Jun 27, 2024

View reviewed changes

Adds a rank-select data structure, see #14

0be8dd3

daniel-j-h force-pushed the add-rs branch from a52ae63 to 0be8dd3 Compare June 27, 2024 20:22

This was referenced Jun 27, 2024

Implement Rank/Select #14

Open

darray: O(1) select #33

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adds a rank-select data structure, see #14 #41

Adds a rank-select data structure, see #14 #41

daniel-j-h commented Jun 27, 2024

daniel-j-h Jun 27, 2024

daniel-j-h Jun 27, 2024

daniel-j-h Jun 27, 2024

daniel-j-h Jun 27, 2024

Adds a rank-select data structure, see #14 #41

Are you sure you want to change the base?

Adds a rank-select data structure, see #14 #41

Conversation

daniel-j-h commented Jun 27, 2024

daniel-j-h Jun 27, 2024

Choose a reason for hiding this comment

daniel-j-h Jun 27, 2024

Choose a reason for hiding this comment

daniel-j-h Jun 27, 2024

Choose a reason for hiding this comment

daniel-j-h Jun 27, 2024

Choose a reason for hiding this comment