Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds rank/select poppy-based implementation, closes #14 #34

Closed
wants to merge 1 commit into from

Conversation

daniel-j-h
Copy link
Member

Please read #14 for context.

This changeset adds a first take on succinct rank/select data structures based on

http://www.cs.cmu.edu/~dga/papers/zhou-sea2013.pdf - named "poppy"

"Space-efficient, high-performance rank and select structures on uncompressed bit sequences"
by Zhou, Andersen, and Kaminsky

The comment #14 (comment) explains the high-level ideas we implement here in our first iteration; wip.

cc @ucyo

@@ -0,0 +1,18 @@
#define _POSIX_C_SOURCE 200809L
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should make the aligned alloc functions user-definable; at least at compile time?

@@ -33,6 +33,9 @@ bool tinygraph_bitset_get_at(tinygraph_bitset_s bitset, uint64_t i);
TINYGRAPH_WARN_UNUSED
uint64_t tinygraph_bitset_get_size(tinygraph_bitset_s bitset);

TINYGRAPH_WARN_UNUSED
uint64_t* tinygraph_bitset_get_data(tinygraph_bitset_s bitset);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Breaking the abstraction here to provide a way to get the underlying uint64 data array for efficient poppy basic block bit counting using the popcount instruction.

const uint64_t nbits = tinygraph_bitset_get_size(bitset);

// 512 bits basic blocks for now
const uint64_t level0_len = nbits / 512;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Todo: we should add

ceil(nbits / 512)

basic blocks, with the last one storing the overall cumsum for the total bitvector

Comment on lines 49 to 50
// TODO: implement optimizations to get 12.5% down to 3.2%
// At the moment we only implement Fig. 3 from the paper
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For a first iteration I think a very simple and first take on rank is fine here; we can then improve down the road



uint64_t tinygraph_poppy_rank_get(tinygraph_poppy_rank_s rank, uint64_t n) {
TINYGRAPH_ASSERT(rank);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Todo: assert that rank->bitset_view is not null (ideally that it has not changed, otherwise tour index gets invalidated)


count += tinygraph_bits_rank(bits[block * 8 + ((n % 512) / 64)], (n % 512) % 64);

return leftof + count;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Todo: we need to simplify this and there's also an off-by-one issue somewhere here; fix it!

// TODO: implement optimizations to get 12.5% down to 3.2%
// At the moment we only implement Fig. 3 from the paper

uint64_t *level0 = tinygraph_align_malloc(cacheline, level0_len);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The two main tricks from the paper: trick 1: cache-line align the basic blocks and make them the size of a cache-line (64 bytes). This will result in very efficient basic block popcounts and just a single cache miss.

Comment on lines 65 to 72
csum += tinygraph_bits_count(bits[i * 8 + 0]);
csum += tinygraph_bits_count(bits[i * 8 + 1]);
csum += tinygraph_bits_count(bits[i * 8 + 2]);
csum += tinygraph_bits_count(bits[i * 8 + 3]);
csum += tinygraph_bits_count(bits[i * 8 + 4]);
csum += tinygraph_bits_count(bits[i * 8 + 5]);
csum += tinygraph_bits_count(bits[i * 8 + 6]);
csum += tinygraph_bits_count(bits[i * 8 + 7]);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The two main tricks from the paper: trick 2: within a basic block we use the popcount instruction; our bits_count function compiles down to popcount (when targeting recent processors > Haswell).

@daniel-j-h
Copy link
Member Author

By now there are better approaches e.g. pasta-flat (see #14 (comment)) and we do have the foundations of this in main

@daniel-j-h daniel-j-h closed this Jun 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

1 participant