You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The offset's deltas describe how many out edges a vertex has
offsets e.g. of [0 1 2 3 ..] means each vertex has a single edge
offsets e.g. of [0, 2, 4, 6, ..] means each vertex has two edges
One idea now could be the following
construct a graph
sort all vertices by number of out edges
partition into sub-ranges of the same number of out ranges (e.g. 1-out edge sub-range, 2-out edge sub-range, etc.)
run-length encode these sub-ranges, so that all we are saving are e.g. the two items (1, n), (2, m) for all 1 and 2 out edge vertices
Think
offsets: [0 1 2 3 4 6 8 10]
offsets: [[0, 1, 2, 3], [4, 6, 8, 10]]
offsets: [(1, 4), (2, 4)]
for a query for vertex v we'd then have to first find the sub-range it belongs to, and then go to the targets array.
Maybe it makes sense to initially compute a histogram of the number of edges to then determine when it makes sense to do this. For example for road networks it probably makes sense to special case the 1/2/3 out-edge cases, just because they are so common, since road networks are quite sparse.
We'd then have to re-number vertices between [0, n] such that they follow the out-edge sorting.
While its space oc-
cupancy is competitive with some state-of-the-art methods
such as γ-δ-Golomb codes and PForDelta, it fails to exploit
the local clustering that inverted lists usually exhibit, namely
the presence of long subsequences of close identifiers.
In this paper we describe a new representation based on
partitioning the list into chunks and encoding both the chunks
and their endpoints with Elias-Fano, hence forming a two-
level data structure. This partitioning enables the encoding
to better adapt to the local statistics of the chunk, thus
exploiting clustering and improving compression.
if we re-order based on number of outgoing edges, then we'll get offset array sub-ranges with deltas of
all 1s for vertices with a single out edge,
all 2s for the vertices with two out edges,
and so on
These sub-ranges we can then run-length encode? Or just try the partitioned EF coding.
We are storing two arrays in our csr graph format
for example for the edges
we are storing internally
The offset's deltas describe how many out edges a vertex has
One idea now could be the following
Think
[0 1 2 3 4 6 8 10]
[[0, 1, 2, 3], [4, 6, 8, 10]]
[(1, 4), (2, 4)]
for a query for vertex v we'd then have to first find the sub-range it belongs to, and then go to the targets array.
Maybe it makes sense to initially compute a histogram of the number of edges to then determine when it makes sense to do this. For example for road networks it probably makes sense to special case the 1/2/3 out-edge cases, just because they are so common, since road networks are quite sparse.
We'd then have to re-number vertices between [0, n] such that they follow the out-edge sorting.
Related
cc @ucyo
The text was updated successfully, but these errors were encountered: