-
Notifications
You must be signed in to change notification settings - Fork 202
This document is intended as an overview of gunrock API, if you're interested in the complete API specification, please refer to the Doxygen generated documentation found at https://gunrock.github.io/gunrock/. You can also check out the tutorial on "How to write a new graph algorithm" for more information.
Graph APIs are very fundamental to gunrock, we will go through the process from construction to usage. First, it is important to identify that a graph can be constructed with multiple views. A graph view is how a graph in gunrock is stored and is "communicated" with. As of writing this document, we currently support three different views: compressed sparse row (CSR), compressed sparse column (CSC), and coordinate format (COO). You can read more about these sparse formats on the web. In the following example, we will use CSR as our graph view, but just know that a graph can be constructed with more than one view as well (we will show an example of that later).
// Define vertex, edge, weight types.
using vertex_t = int;
using edge_t = int;
using weight_t = float;
// Define CSR view type.
using csr_v_t = graph::graph_csr_t<vertex_t, edge_t, weight_t>
// Define the graph type.
using my_graph_t = graph::graph_t<
memory::memory_space_t::device, // construct graph on device.
vertex_t, edge_t, weight_t, csr_v_t>;
my_graph_t G;
We now have a graph G
, defined using my_graph_t
with a csr_v_t
as the view. Once a graph is created, to actually use it, we must set the underlying CSR data.
// Where the parameters for the csr matrix are passed in.
G.template set<csr_v_t>(num_rows, num_nonzeros, row_offsets, column_indices, values);
A simpler way to build the graph is to use the build::
helper functions, where you pass in the compressed sparse row data and it returns the generated graph.
auto G = graph::build::from_csr<memory_space_t::device, graph::view_t::csr>(
csr.number_of_rows, // rows
csr.number_of_columns, // columns
csr.number_of_nonzeros, // nonzeros
csr.row_offsets.data().get(), // row_offsets
csr.column_indices.data().get(), // column_indices
csr.nonzero_values.data().get() // values
); // supports row_indices and column_offsets (default = nullptr)
Graph class has many useful members, which you can access in the __device__
code. Even if the graph is declared on memory::memory_space_t::device
constants members (like the number of vertices or edges within a graph) are also available in the __host__
code. To see all the available APIs, see the implementation.
In your __host__
code, you can do the following:
auto num_vertices = G.get_number_of_vertices();
auto num_edges = G.get_number_of_edges();
std::cout << "\tNumber of Graph Representations = "
<< G.number_of_graph_representations() << std::endl;
std::cout << "\tContains CSR Representation? " << std::boolalpha
<< G.template contains_representation<csr_view_t>() << std::endl;
Or in your __device__
code, you can do the following:
auto source = 2;
auto edge = 1;
auto num_vertices = G.get_number_of_vertices();
auto num_edges = G.get_number_of_edges();
auto num_neighbors = G.get_number_of_neighbors(source);
auto source_vertex = G.get_source_vertex(edge);
auto destination_vertex = G.get_destination_vertex(edge);
auto edge_weight = G.get_edge_weight(edge);
The graph::
namespace provides helpful functions.
thrust::device_vector<vertex_t> histogram(sizeof(vertex_t) * 8 + 1);
graph::build_degree_histogram(G, histogram.data().get());
See a more complete example of using graph APIs in test_graph.cu unit test.
As described briefly, operators in Essentials are parallel primitives that perform operations on frontiers or graphs. We offer a range of operators, which are all found under include/gunrock/framework/operators/
. All of these operators, their configurations, and related APIs are accessible using the following include and namespace:
#include <gunrock/framework/operators/operators.hxx>
using namespace operators;
Advance is a traversal operator, which takes in an input frontier of vertices or edges, and generates an output frontier containing all of the neighboring vertices or edges. Advance can be called using a beginner API, which simplifies the interface, or an expert API allowing more control over the input and output parameters.
auto sample_lambda = [=] __host__ __device__(
vertex_t const& source, // ... source
vertex_t const& neighbor, // neighbor
edge_t const& edge, // edge
weight_t const& weight // weight (tuple).
) -> bool {
// Do something on the source, neighbor, edge, weight tuple.
};
// Execute advance operator on the provided lambda.
operators::advance::execute(G, E, sample_lambda, context);
auto sample_lambda = [=] __host__ __device__(
vertex_t const& source, // ... source
vertex_t const& neighbor, // neighbor
edge_t const& edge, // edge
weight_t const& weight // weight (tuple).
) -> bool {
// Do something on the source, neighbor, edge, weight tuple.
};
// Execute advance operator on the provided lambda.
operators::advance::execute<operators::load_balance_t::block_mapped,
operators::advance_direction_t::forward,
operators::advance_io_type_t::graph,
operators::advance_io_type_t::none>(
G, E, sample_lambda, context);
Some example APIs... (using batch
)
auto f = [&](std::size_t job_idx) -> float {
return 0.0f; // ... return elapsed time of one job.
};
std::size_t n_jobs = 10;
std::vector<float> total_elapsed(1); // to calculate total elapsed time of n_jobs.
operators::batch::execute(f, n_jobs, total_elapsed.data());
Frontier is a data structure that represents the active working set in a graph algorithm. A frontier could represent a subset of a graph (such as a single source node from which an algorithm begins processing) or an entire graph. Like the graph_t
representation, our frontier abstraction allows the frontier to be viewed as many different underlying data structures (such as a vector/sparse-frontier, boolmap, bitmap, etc.) Although not all of these representations might be implemented yet, the abstraction allows for easy integration with simple C++ inheritance.
Created on the host, but a frontier can be passed into a kernel (__global__
) function and many of its APIs prefixed with __device__
can be used within a kernel.
using vertex_t = int;
using edge_t = int;
frontier::frontier_t<
vertex_t, // vertex type
edge_t, // edge type
frontier::frontier_kind_t::vertex_frontier, // type of the frontier (vertex or edge frontier)
frontier::frontier_view_t::vector // underlying frontier data structure
> my_frontier;
APIs are available in gunrock/framework/frontier
. We'll go through some common use cases:
// get the number of elements in a frontier, for some underlying
// views, this may require a stream.
std::size_t n = my_frontier.get_number_of_elements();
// check if the frontier is empty.
bool empty = my_frontier.is_empty();
Inside a CUDA kernel or a __device__
lambda function, one might want to use the frontier to set or get an element:
my_frontier.get_element_at(0); // get the element at 0th position
my_frontier.set_element_at(my_vertex, 0); // set 0th position to my_vertex
When represented as a vector frontier, resizing a frontier could be a challenging task. For example, when advancing an input frontier, the output frontier may become really large, and you'd want it to be sized appropriately to handle all the neighboring vertices of the input frontier. However, resizing requires reallocating memory and moving the data over and is often an expensive task. Therefore, in gunrock, we utilize a frontier_resizing_factor
property, which simply means when resizing to a certain size, multiply that size with the frontier_resizing_factor
and allocate more memory than needed: size * frontier_resizing_factor
(real capacity allocated). This is done to avoid reallocation as much as possible.
How to write a new graph algorithm provides a good introduction to the useful APIs for problem_t
and enactor_t
. The simplest explanation of the two is: problem_t
is used to set up all of your data and enactor_t
is used to implement your algorithm (and use the data set in problem_t
). Two things to clarify: 1) the reason why these structs exist is to provide a clean way to hide a lot of gunrock's execution details, and maybe in the future add required multi-GPUs pathways without needing to change the algorithm at the user level, and 2) from an application/algorithm's perspective, the user simply needs to inherit gunrock's enactor_t
and problem_t
and start taking advantage of it.
- Overview
- Publications
- Presentations
- Copyright and License
- Developers and Contributors
- Frequently Asked Questions
- Acknowledgments
- Programming Model
- Gunrock Operators
- Graph Algorithms
- Getting Essentials
- Gunrock For Linux
- Gunrock For Windows
- Gunrock For Max OSX
- (GitHub Template)
gunrock
project example
- MGPU, Python, Docs (needs review)
- Boolmap Frontier
- Hypergraphs (private)
- Modern CPP Features
- Programming Interface Examples (API)
- Style Guide
- Understanding the code structure
- Git Workflow
-
Debugging with
cuda-memcheck
andcuda-gdb
- Profiling with NVIDIA Nsight Systems and Compute
- Unit testing with GoogleTest
- Performance analysis
- How to write a new graph algorithm
-
PageRank: From
networkx
togunrock
- How to write parallel operators
- How to add a new graph representation
- How to add a new frontier representation
- How to add multiple GPU support
- How to bind an application to python
- How to use
thrust
/cub