-
Notifications
You must be signed in to change notification settings - Fork 311
Graph Contraction Algorithms
Graph contraction is a technique for implementing recursive graph algorithms, where on each iteration the algorithm is repeated on a smaller graph contracted from the previous step. Probably the most well-known algorithm based on graph contraction is Boruvska's algorithm for computing the Minimum Spanning Forest. Efficient implementation, with a slight modification to the Boruvska's algorithm has now been implemented in GraphChi as well: GraphChi's minimum spanning forest.
Graph contraction in GraphChi is implemented by creating a new graph during the computation. That is, instead of defining which vertices and edges to remove, the idea is to output or emit edges for the new graph. When all the edges for the new graph have been output, GraphChi will preprocess the graph and create the corresponding shards. At this point, possible duplicate edges are removed (programmer can specify rule for which edges are retained -- for example the minimum spanning forest (MSF) algorithm retains the edge with the smallest weight).
To output a new graph, one must initialize a graph-output object as follows:
std::string contractedname = filename + "C";
sharded_graph_output<VertexDataType, EdgeDataType> shardedout(contractedname, new AcceptMinimum());
CONTRACTED_GRAPH_OUTPUT = engine.add_output(&shardedout);
(Note, engine can have many output-objects. The minimum spanning forest application uses two outputs: one to output the MSF and one to produce contracted graph for next step).
Now, during the computation in the update function, one can emit new edges to the graph as follows:
graphchi_engine<VertexDataType, EdgeDataType> * gengine;
...
void update(graphchi_vertex<VertexDataType, EdgeDataType> &vertex, graphchi_context &gcontext) {
.....
gengine->output(CONTRACTED_GRAPH_OUTPUT)->output_edgeval(a, b,edata);
}
Above, gengine
is a global variable holding the graphchi engine object (admittedly, not very elegant).
Now, after the GraphChi computation which produces the new graph has finished, you need to finish by sharding the new graph:
nshards = (int)shardedout.finish_sharding();
We encourage studying the MSF code carefully to see how it works.
Our implementation of Boruvska's algorithm is based on label propagation (similar to our implementation of connected components): each vertex starts with label equaling its own ID. When updated, each vertex chooses the minimum label of its neighbors and its own ID, and writes this ID to its incident edges (so its neighbors can observe its label). In the MSF, minimum label is chosen only among neighbors that are connected via minimum edges. Now in the contraction phase, all vertices that have same label are contracted into a new vertex with that label, and edges with span vertices with different labels are retained.
We run the label propagation only for two iterations, which means that - unlike in the Boruvksa's algorithm - we do not contract maximal components. This does not affect correctness if care is taken when removing minimum edges from the graph, but improves practical performance very significantly.
We are working on a paper which describes the algorithm in more detail.
Our implementation of MSF also has alternative contraction algorithm based on star contraction, an efficient technique borrowed from the parallel algorithms research. Star contraction is guaranteed to contract 25% edges in expectation. We are working on analyzing our label propagation algorithm, but in practice we see contraction of way over 50% in many cases.