-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds rank/select poppy-based implementation, closes #14 #34
Conversation
tinygraph-align.c
Outdated
@@ -0,0 +1,18 @@ | |||
#define _POSIX_C_SOURCE 200809L |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should make the aligned alloc functions user-definable; at least at compile time?
tinygraph-bitset.h
Outdated
@@ -33,6 +33,9 @@ bool tinygraph_bitset_get_at(tinygraph_bitset_s bitset, uint64_t i); | |||
TINYGRAPH_WARN_UNUSED | |||
uint64_t tinygraph_bitset_get_size(tinygraph_bitset_s bitset); | |||
|
|||
TINYGRAPH_WARN_UNUSED | |||
uint64_t* tinygraph_bitset_get_data(tinygraph_bitset_s bitset); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Breaking the abstraction here to provide a way to get the underlying uint64 data array for efficient poppy basic block bit counting using the popcount instruction.
tinygraph-poppy.c
Outdated
const uint64_t nbits = tinygraph_bitset_get_size(bitset); | ||
|
||
// 512 bits basic blocks for now | ||
const uint64_t level0_len = nbits / 512; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Todo: we should add
ceil(nbits / 512)
basic blocks, with the last one storing the overall cumsum for the total bitvector
tinygraph-poppy.c
Outdated
// TODO: implement optimizations to get 12.5% down to 3.2% | ||
// At the moment we only implement Fig. 3 from the paper |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For a first iteration I think a very simple and first take on rank is fine here; we can then improve down the road
tinygraph-poppy.c
Outdated
|
||
|
||
uint64_t tinygraph_poppy_rank_get(tinygraph_poppy_rank_s rank, uint64_t n) { | ||
TINYGRAPH_ASSERT(rank); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Todo: assert that rank->bitset_view
is not null (ideally that it has not changed, otherwise tour index gets invalidated)
tinygraph-poppy.c
Outdated
|
||
count += tinygraph_bits_rank(bits[block * 8 + ((n % 512) / 64)], (n % 512) % 64); | ||
|
||
return leftof + count; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Todo: we need to simplify this and there's also an off-by-one issue somewhere here; fix it!
tinygraph-poppy.c
Outdated
// TODO: implement optimizations to get 12.5% down to 3.2% | ||
// At the moment we only implement Fig. 3 from the paper | ||
|
||
uint64_t *level0 = tinygraph_align_malloc(cacheline, level0_len); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The two main tricks from the paper: trick 1: cache-line align the basic blocks and make them the size of a cache-line (64 bytes). This will result in very efficient basic block popcounts and just a single cache miss.
tinygraph-poppy.c
Outdated
csum += tinygraph_bits_count(bits[i * 8 + 0]); | ||
csum += tinygraph_bits_count(bits[i * 8 + 1]); | ||
csum += tinygraph_bits_count(bits[i * 8 + 2]); | ||
csum += tinygraph_bits_count(bits[i * 8 + 3]); | ||
csum += tinygraph_bits_count(bits[i * 8 + 4]); | ||
csum += tinygraph_bits_count(bits[i * 8 + 5]); | ||
csum += tinygraph_bits_count(bits[i * 8 + 6]); | ||
csum += tinygraph_bits_count(bits[i * 8 + 7]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The two main tricks from the paper: trick 2: within a basic block we use the popcount instruction; our bits_count function compiles down to popcount (when targeting recent processors > Haswell).
By now there are better approaches e.g. pasta-flat (see #14 (comment)) and we do have the foundations of this in main
|
Please read #14 for context.
This changeset adds a first take on succinct rank/select data structures based on
http://www.cs.cmu.edu/~dga/papers/zhou-sea2013.pdf - named "poppy"
The comment #14 (comment) explains the high-level ideas we implement here in our first iteration; wip.
cc @ucyo