-
Notifications
You must be signed in to change notification settings - Fork 5
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Streaming Graph Query Processor - initial commit
- Loading branch information
1 parent
f812192
commit 9aad28c
Showing
38 changed files
with
9,388 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,41 @@ | ||
[package] | ||
name = "sgraffito-query" | ||
version = "0.1.0" | ||
authors = ["Anil Pacaci <apacaci@uwaterloo.ca>"] | ||
edition = "2018" | ||
|
||
# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html | ||
|
||
[dependencies] | ||
timely = "0.11" | ||
differential-dataflow = "0.11" | ||
abomonation = "0.7" | ||
abomonation_derive = "0.5" | ||
csv = "1.1" | ||
env_logger = "0.7.1" | ||
futures = { version = "0.3.*" } | ||
hashers = "1.0.1" | ||
hashbrown = "0.9.1" | ||
hdrhistogram = "7.2.0" | ||
itertools = "0.9" | ||
log = "0.4.11" | ||
metrics-runtime = "0.13.1" | ||
metrics-core = "0.5.2" | ||
metrics-util = "0.3.2" | ||
|
||
pest = "2.1" | ||
pest_derive = "2.1" | ||
priority-queue = "1.0.2" | ||
strum = "0.15.0" | ||
strum_macros = "0.15.0" | ||
|
||
[dev-dependencies] | ||
rand="0.4" | ||
|
||
[profile.release] | ||
debug = true | ||
|
||
[[bin]] | ||
name = "sgraffito-query" | ||
path = "src/main.rs" | ||
bench = false |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,93 @@ | ||
# Streaming Graph Query Processor | ||
|
||
Our prototype implementation for a streaming graph query processor as a part of [S-Graffito](https://dsg-uwaterloo.github.io/s-graffito/). | ||
|
||
It is based on `Streaming Graph Algebra` that we propose to precisely describes semantics of complex graph queries over streaming graphs. | ||
Our prototype uses [Timely Dataflow](https://github.com/TimelyDataflow/timely-dataflow) as the underlying execution engine. | ||
Operators of our query processor are implemented as Timely operators, | ||
and streaming graph queries are formulated as dataflow computation to be executed on the underlying Timely execution engine. | ||
|
||
For more details about our framework, please refer to our [paper](arxiv-link) | ||
|
||
### Setup | ||
Being implemented on top of Timely Dataflow, SGQ processor is implemented in Rust and use Cargo package manager. | ||
To run SGQ processor, download the source code, install Cargo and run the following: | ||
|
||
``` | ||
$ cd sgraffito-query && cargo build --release | ||
``` | ||
|
||
### Usage | ||
|
||
We provide two helper utility to execute streaming graph queries on both using SGA operators and [Differential Dataflow](https://github.com/TimelyDataflow/differential-dataflow) operators, namely `sga-runner` and `dd-runner`. | ||
|
||
To execute a query: | ||
|
||
```$ cargo run --example [dd|sga]-runner window slide input_type input_file output_dir query arguments [predicates]``` | ||
|
||
* `window` the size of the time-based sliding window | ||
* `slide` slide interval that controls the granularity of window movements | ||
* `input_type` possible values: | ||
1. `s` string vertex identifiers & no source timestamp | ||
2. `st` string vertex identifiers & timestamped by source | ||
3. `i` integer vertex identifiers & no source timestamp | ||
4. `it` integer vertex identifiers & timestamped by source | ||
* `input_file` absolute path to input file | ||
* `output_dir` absolute path for directory to log runtime metrics | ||
* `query` Tha name of the streaming graph query from the Table 1 of our paper | ||
* `arguments` # of arguments for a particular `query` | ||
* `predicates` Arguments (edge labels) for the `query` | ||
|
||
Input files have the following format (if the input is not timestamped, use `s` or `i` for the `input_type` parameter): | ||
```source_identifier edge_label target_identifier [timestamp]``` | ||
|
||
### Reproducibility | ||
|
||
We provide a helper python script (`scripts/test-runner.py`), a set of configuration files (`config`) to reproduce the experiments presented in our paper. | ||
Additionally, datasets we used in the experiments in a pre-processed form can be downloaded from [here](https://vault.cs.uwaterloo.ca/s/6PyPGJfJQ6zcmKD). | ||
Otherwise, the following scripts require input files to have no self-edges and to include one edge per line, | ||
where each edge is represented by `source_vertex`, `label`, `target_vertex`, and `unix_timestamp` separated by whitespace. | ||
|
||
|
||
To run a particular experiment from a configuration file: | ||
|
||
``` $ scripts/test-runner.py config/[configuration_file]``` | ||
|
||
Each configuration file defines a set of experiments over a single dataset with multiple values for other parameters. An example configuration: | ||
|
||
``` | ||
{ | ||
"name" : "so", | ||
"dataset" : "path to stackoverflow dataset", | ||
"input-type" : "it", | ||
"report-folder" : "so-results", | ||
"timeout" : 600, | ||
"project-base" : "project base directory", | ||
"runs" : [ | ||
{ | ||
"query-name" : "query1", | ||
"index" : "1", | ||
"exec-name": "sga-runner", | ||
"window-size" : 864000, | ||
"slide-size" : 86400, | ||
"predicates" : [ | ||
"a2q" | ||
] | ||
}, | ||
{ | ||
"query-name" : "query2", | ||
"index" : "1", | ||
"exec-name": "sga-runner", | ||
"window-size" : 864000, | ||
"slide-size" : 86400, | ||
"predicates" : [ | ||
"a2q", | ||
"c2q" | ||
] | ||
} | ||
] | ||
} | ||
``` | ||
|
||
This configuration files specifies 2 runs over the StackOverflow dataset for `query1` and `query2` with 10 day windows and 1 day slide intervals. | ||
To use configuration files, please set `dataset`, `report-folder`, and `project-base` parameters based on your local setup. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,120 @@ | ||
{ | ||
"name": "ldbc-sf10", | ||
"dataset": "/hdd1/apacaci/datasets/ldbc/sf10/update-sorted-unixts.csv", | ||
"input-type": "it", | ||
"report-folder": "/home/apacaci/sgraffito-query/results/ldbc-sf10-query4/", | ||
"timeout": 600, | ||
"project-base": "/hdd1/apacaci/sgraffito/sgraffito-query", | ||
"runs": [ | ||
{ | ||
"query-name": "query4", | ||
"index": "1", | ||
"exec-name": "dd-runner", | ||
"window-size": 2592000, | ||
"slide-size": 86400, | ||
"predicates": [ | ||
"moderatorOf", "containerOf", "hasCreator" | ||
] | ||
}, | ||
{ | ||
"query-name": "query4", | ||
"index": "1", | ||
"exec-name": "sga-runner", | ||
"window-size": 2592000, | ||
"slide-size": 86400, | ||
"predicates": [ | ||
"moderatorOf", "containerOf", "hasCreator" | ||
] | ||
}, | ||
{ | ||
"query-name": "query4-a", | ||
"index": "1", | ||
"exec-name": "sga-runner", | ||
"window-size": 2592000, | ||
"slide-size": 86400, | ||
"predicates": [ | ||
"moderatorOf", "containerOf", "hasCreator" | ||
] | ||
}, | ||
{ | ||
"query-name": "query4-pc1", | ||
"index": "1", | ||
"exec-name": "sga-runner", | ||
"window-size": 2592000, | ||
"slide-size": 86400, | ||
"predicates": [ | ||
"moderatorOf", "containerOf", "hasCreator" | ||
] | ||
}, | ||
{ | ||
"query-name": "query4-pc2", | ||
"index": "1", | ||
"exec-name": "sga-runner", | ||
"window-size": 2592000, | ||
"slide-size": 86400, | ||
"predicates": [ | ||
"moderatorOf", "containerOf", "hasCreator" | ||
] | ||
}, | ||
{ | ||
"query-name": "query2", | ||
"index": "1", | ||
"exec-name": "dd-runner", | ||
"window-size": 2592000, | ||
"slide-size": 86400, | ||
"predicates": [ | ||
"likes", "replyOf" | ||
] | ||
}, | ||
{ | ||
"query-name": "query2", | ||
"index": "1", | ||
"exec-name": "sga-runner", | ||
"window-size": 2592000, | ||
"slide-size": 86400, | ||
"predicates": [ | ||
"likes", "replyOf" | ||
] | ||
}, | ||
{ | ||
"query-name": "query2-a", | ||
"index": "1", | ||
"exec-name": "sga-runner", | ||
"window-size": 2592000, | ||
"slide-size": 86400, | ||
"predicates": [ | ||
"likes", "replyOf" | ||
] | ||
}, | ||
{ | ||
"query-name": "query3", | ||
"index": "1", | ||
"exec-name": "dd-runner", | ||
"window-size": 2592000, | ||
"slide-size": 86400, | ||
"predicates": [ | ||
"likes", "replyOf", "replyOf" | ||
] | ||
}, | ||
{ | ||
"query-name": "query3", | ||
"index": "1", | ||
"exec-name": "sga-runner", | ||
"window-size": 2592000, | ||
"slide-size": 86400, | ||
"predicates": [ | ||
"likes", "replyOf", "replyOf" | ||
] | ||
}, | ||
{ | ||
"query-name": "query3-a", | ||
"index": "1", | ||
"exec-name": "sga-runner", | ||
"window-size": 2592000, | ||
"slide-size": 86400, | ||
"predicates": [ | ||
"likes", "replyOf", "replyOf" | ||
] | ||
} | ||
] | ||
} |
Oops, something went wrong.