This is an implementation of the LDBC SNB interactive workload for Neo4j. All queries are expressed as Cypher statements and executed transactionally over HTTP against a running Neo4j webserver using Neo4j's REST API. All queries defined in the LDBC Social Network Benchmark Specification v0.2.2 have been implemented and have passed validation (woohoo!).
In addition, the repository includes a couple of useful tools for helping with loading a dataset into Neo4j and testing individual queries to checkout their results and runtime performance. See below for how to use these tools and how to run the benchmark.
This is a relatively new project and all contributions and/or suggestions are welcome.
This implementation has been tested against the following versions of Neo4j and LDBC:
- Neo4j: v2.3.3
- LDBC driver: 0.3-SNAPSHOT
The Neo4j database connector is not currently thread-safe. Therefore, it is not safe to run it with the driver using multiple threads.Neo4jDb is now thread safe!- The Neo4j database connector does not support authentication to the Neo4j server. Authentication must be disabled to run the workload.
Neo4j comes with a life-saver tool for importing datasets as quickly as
possible (it will, quite literally, save hours of your life), called the Neo4j
Import Tool. To use it,
however, dataset files generated by the LDBC SNB
Data-Generator must first be
converted into the format expected by the tool. To do this, use the
DataFormatConverter
utility included in this repository.
DataFormatConverter: A utility for converting dataset files generated by the
LDBC SNB Data Generator to the file format expected by the Neo4j import tool.
Output files are placed in the DEST directory, along with an import.sh script
that runs the Neo4j Import Tool automatically on those files. For this script
to work, please set your NEO4J_HOME environment variable appropriately.
Usage:
DataFormatConverter SOURCE DEST
DataFormatConverter (-h | --help)
DataFormatConverter --version
Arguments:
SOURCE Directory containing SNB dataset files.
DEST Destination directory for output files.
Options:
-h --help Show this screen.
--version Show version.
Here is an example of how to use it:
mvn exec:java -Dexec.mainClass="net.ellitron.ldbcsnbimpls.interactive.neo4j.util.DataFormatConverter" -Dexec.args="/path/to/social_network/ /path/to/output/"
cd /path/to/output
chmod +x import.sh
export NEO4J_HOME=/path/to/neo4j
./import.sh
cd graph.db
ls
Once the import script has been run, the directory ./graph.db
will contain
the resulting Neo4j database files. These files can then be directly copied to
your /path/to/neo4j/data/graph.db
directory (after its been cleared of any
previous files), which the server will load the next time it starts up. Here's
an example:
rm -rf $NEO4J_HOME/data/graph.db/*
cp -rf /path/to/output/graph.db/* $NEO4J_HOME/data/graph.db/
cd $NEO4J_HOME
./bin/neo4j console
To allow remote connections to the server, uncomment the
org.neo4j.server.webserver.address
and set it appropriately. Otherwise the
workload must be run on the same machine as the Neo4j server.
# Let the webserver only listen on the specified IP. Default is localhost (only
# accept local connections). Uncomment to allow any connection. Please see the
# security section in the neo4j manual before modifying this.
org.neo4j.server.webserver.address=0.0.0.0
Disable authentication on the server. Currently the workload driver doesn't support authentication.
# Require (or disable the requirement of) auth to access Neo4j
dbms.security.auth_enabled=false
Unless you're trying to find a lower-bound on performance, indices should be
created on all the nodes once the server is running. indexCreation.neo4j
is a
script included in this repository for doing that, for use with the
neo4j-shell. Run this on the same machine as the server process (or if remote
shell connections have been enabled in the neo4j.properties file, supply host
and port options to connect remotely):
$NEO4J_HOME/bin/neo4j-shell -file ./scripts/indexCreation.neo4j
Note that the index creation commands are asynchronous and may take a bit of time before they are fully created on the server. Queries executed before the indices are fully created will experience the expected slowdown.
In this repository, use the Maven assembly plugin to create a single all encompassing jar:
mvn clean compile assembly:single
This should create a jar that's named something like
snb-interactive-neo4j-0.1.0-jar-with-dependencies.jar
.
Then, cd into your LDBC driver directory and run the driver with this jar in
the classpath. Neo4jDb requires only two configuration parameters, host
and
port
, which are the host IP address and port of the Neo4j webserver,
respectively. If left unspecified, these will default to 127.0.0.1
and
7474
.
java -cp target/jeeves-0.3-SNAPSHOT.jar:/path/to/this/repo/target/snb-interactive-neo4j-0.1.0-jar-with-dependencies.jar com.ldbc.driver.Client -P configuration/ldbc_driver_default.properties -P configuration/ldbc/snb/interactive/ldbc_snb_interactive_SF-0001.properties -P /path/to/social_network/updateStream.properties -p host 192.168.1.101 -p port 7474
It is assumed that the user is already familiar with how to modify
configuration/ldbc_driver_default.properties
and
configuration/ldbc/snb/interactive/ldbc_snb_interactive_SF_XXXX.properties
configuration files appropriately for their workload.
To run individual queries use the QueryTester
utility included in this repo:
QueryTester: A utility for running individual queries for testing purposes.
Usage:
QueryTester [options] query1 <personId> <firstName> <limit>
QueryTester [options] query2 <personId> <maxDate> <limit>
QueryTester [options] query3 <personId> <countryXName> <countryYName> <startDate> <durationDays> <limit>
QueryTester [options] query4 <personId> <startDate> <durationDays> <limit>
QueryTester [options] query5 <personId> <minDate> <limit>
QueryTester [options] query6 <personId> <tagName> <limit>
QueryTester [options] query7 <personId> <limit>
QueryTester [options] query8 <personId> <limit>
QueryTester [options] query9 <personId> <maxDate> <limit>
QueryTester [options] query10 <personId> <month> <limit>
QueryTester [options] query11 <personId> <countryName> <workFromYear> <limit>
QueryTester [options] query12 <personId> <tagClassName> <limit>
QueryTester [options] query13 <person1Id> <person2Id>
QueryTester [options] query14 <person1Id> <person2Id>
QueryTester [options] shortquery1 <personId>
QueryTester [options] shortquery2 <personId> <limit>
QueryTester [options] shortquery3 <personId>
QueryTester [options] shortquery4 <messageId>
QueryTester [options] shortquery5 <messageId>
QueryTester [options] shortquery6 <messageId>
QueryTester [options] shortquery7 <messageId>
QueryTester [options] update1 <nth>
QueryTester [options] update2 <nth>
QueryTester [options] update3 <nth>
QueryTester [options] update4 <nth>
QueryTester [options] update5 <nth>
QueryTester [options] update6 <nth>
QueryTester [options] update7 <nth>
QueryTester [options] update8 <nth>
QueryTester (-h | --help)
QueryTester --version
Options:
--config=<file> QueryTester configuration file
[default: ./config/querytester.properties].
--repeat=<n> How many times to repeat the query. If n > 1
then normal query result output will be
surpressed to show only the query timing
information
[default: 1].
--input=<input> Directory of updateStream files to use as
input for update queries (the nth update of
its kind will be selected from the stream to
execute) [default: ./].
--timeUnits=<unit> Unit of time in which to report timings
(SECONDS, MILLISECONDS, MICROSECONDS,
NANOSECONDS) [default: MILLISECONDS].
-h --help Show this screen.
--version Show version.
Here is a usage example:
mvn exec:java
-Dexec.mainClass="net.ellitron.ldbcsnbimpls.interactive.neo4j.util.QueryTester"
-Dexec.args="shortquery1 933"
Query:
LdbcShortQuery1PersonProfile{personId=933}
Query Stats:
Units: MILLISECONDS
Count: 1
Min: 256
Max: 256
Mean: 256
25th Percentile: 256
50th Percentile: 256
75th Percentile: 256
90th Percentile: 256
95th Percentile: 256
99th Percentile: 256
LdbcShortQuery1PersonProfileResult{firstName='Mahinda', lastName='Perera', birthday=628732800000, locationIp='192.248.2.123', browserUsed='Firefox', cityId=1359, gender='male', creationDate=1268868730447}
mvn exec:java
-Dexec.mainClass="net.ellitron.ldbcsnbimpls.interactive.neo4j.util.QueryTester"
-Dexec.args="--repeat 1000 --timeUnits MICROSECONDS shortquery1 933"
Query:
LdbcShortQuery1PersonProfile{personId=933}
Query Stats:
Units: MICROSECONDS
Count: 1000
Min: 611
Max: 341849
Mean: 1256
25th Percentile: 720
50th Percentile: 809
75th Percentile: 991
90th Percentile: 1293
95th Percentile: 1555
99th Percentile: 2097
mvn exec:java
-Dexec.mainClass="net.ellitron.ldbcsnbimpls.interactive.neo4j.util.QueryTester"
-Dexec.args="--input /home/jdellit/git/ldbc_snb_datagen/social_network update1 7"
Query:
LdbcUpdate1AddPerson{personId=17592186052198, personFirstName='Mikhail', personLastName='Basov', gender='male', birthday=Fri Jan 16 16:00:00 PST 1981, creationDate=Sun Oct 21 02:03:48 PDT 2012, locationIp='31.28.124.76', browserUsed='Firefox', cityId=843, languages=[ru, en], emails=[], tagIds=[58, 282, 288, 468, 777, 779, 780, 797, 809, 973, 974, 1153, 1185, 1201, 1533, 1615, 1676, 1733, 1758, 1765, 1769, 1984, 1993, 2011, 2092, 2104, 2780, 2786, 2787, 2792, 2832, 2841, 2845, 2848, 2849, 2859, 2865, 2874, 2887, 2902, 2907, 2939, 2940, 2990, 3001, 3065, 3076, 3086, 3109, 4878, 4952, 5078, 6168, 6981, 7804, 9319, 9448, 14155, 14826], studyAt=[Organization{organizationId=6004, year=2002}], workAt=[Organization{organizationId=1043, year=2003}, Organization{organizationId=1096, year=2002}, Organization{organizationId=1102, year=2002}]}
Query Stats:
Units: MILLISECONDS
Count: 1
Min: 370
Max: 370
Mean: 370
25th Percentile: 370
50th Percentile: 370
75th Percentile: 370
90th Percentile: 370
95th Percentile: 370
99th Percentile: 370
com.ldbc.driver.workloads.ldbc.snb.interactive.LdbcNoResult@241e8b0a
To run validation, I used the Neo4j files included in this
repository.
Specifically, use the DataFormatConverter
on the non-merged, string-formatted
date version of the dataset:
mvn exec:java -Dexec.mainClass="net.ellitron.ldbcsnbimpls.interactive.neo4j.util.DataFormatConverter" -Dexec.args="/path/to/ldbc_snb_interactive_validation/neo4j/neo4j--validation_set/social_network/string_date/ /path/to/outputDir/"
Then run validation as follows, being sure to modify any configuration settings
in
ldbc_snb_interactive_validation/neo4j/readwrite_neo4j--ldbc_driver_config--db_validation.properties
as necessary:
java -cp target/jeeves-0.3-SNAPSHOT.jar:/path/to/this/repo/target/snb-interactive-neo4j-0.1.0-jar-with-dependencies.jar com.ldbc.driver.Client -P /path/to/ldbc_snb_interactive_validation/neo4j/readwrite_neo4j--ldbc_driver_config--db_validation.properties