Skip to content

Latest commit

 

History

History

snb-interactive-neo4j

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 

Neo4j LDBC SNB Interactive Workload Implementation

This is an implementation of the LDBC SNB interactive workload for Neo4j. All queries are expressed as Cypher statements and executed transactionally over HTTP against a running Neo4j webserver using Neo4j's REST API. All queries defined in the LDBC Social Network Benchmark Specification v0.2.2 have been implemented and have passed validation (woohoo!).

In addition, the repository includes a couple of useful tools for helping with loading a dataset into Neo4j and testing individual queries to checkout their results and runtime performance. See below for how to use these tools and how to run the benchmark.

This is a relatively new project and all contributions and/or suggestions are welcome.

Dependency Information

This implementation has been tested against the following versions of Neo4j and LDBC:

  • Neo4j: v2.3.3
  • LDBC driver: 0.3-SNAPSHOT

Special Notes

  • The Neo4j database connector is not currently thread-safe. Therefore, it is not safe to run it with the driver using multiple threads. Neo4jDb is now thread safe!
  • The Neo4j database connector does not support authentication to the Neo4j server. Authentication must be disabled to run the workload.

Instructions

1. Dataset Loading

Neo4j comes with a life-saver tool for importing datasets as quickly as possible (it will, quite literally, save hours of your life), called the Neo4j Import Tool. To use it, however, dataset files generated by the LDBC SNB Data-Generator must first be converted into the format expected by the tool. To do this, use the DataFormatConverter utility included in this repository.

DataFormatConverter: A utility for converting dataset files generated by the
LDBC SNB Data Generator to the file format expected by the Neo4j import tool.
Output files are placed in the DEST directory, along with an import.sh script
that runs the Neo4j Import Tool automatically on those files. For this script
to work, please set your NEO4J_HOME environment variable appropriately.
Usage:
  DataFormatConverter SOURCE DEST
  DataFormatConverter (-h | --help)
  DataFormatConverter --version

Arguments:
  SOURCE  Directory containing SNB dataset files.
  DEST    Destination directory for output files.

Options:
  -h --help         Show this screen.
  --version         Show version.

Here is an example of how to use it:

mvn exec:java -Dexec.mainClass="net.ellitron.ldbcsnbimpls.interactive.neo4j.util.DataFormatConverter" -Dexec.args="/path/to/social_network/ /path/to/output/"
cd /path/to/output
chmod +x import.sh
export NEO4J_HOME=/path/to/neo4j
./import.sh
cd graph.db
ls

Once the import script has been run, the directory ./graph.db will contain the resulting Neo4j database files. These files can then be directly copied to your /path/to/neo4j/data/graph.db directory (after its been cleared of any previous files), which the server will load the next time it starts up. Here's an example:

rm -rf $NEO4J_HOME/data/graph.db/*
cp -rf /path/to/output/graph.db/* $NEO4J_HOME/data/graph.db/
cd $NEO4J_HOME
./bin/neo4j console

2. Server Configuration

To allow remote connections to the server, uncomment the org.neo4j.server.webserver.address and set it appropriately. Otherwise the workload must be run on the same machine as the Neo4j server.

# Let the webserver only listen on the specified IP. Default is localhost (only
# accept local connections). Uncomment to allow any connection. Please see the
# security section in the neo4j manual before modifying this.
org.neo4j.server.webserver.address=0.0.0.0

Disable authentication on the server. Currently the workload driver doesn't support authentication.

# Require (or disable the requirement of) auth to access Neo4j
dbms.security.auth_enabled=false

3. Creating Indices

Unless you're trying to find a lower-bound on performance, indices should be created on all the nodes once the server is running. indexCreation.neo4j is a script included in this repository for doing that, for use with the neo4j-shell. Run this on the same machine as the server process (or if remote shell connections have been enabled in the neo4j.properties file, supply host and port options to connect remotely):

$NEO4J_HOME/bin/neo4j-shell -file ./scripts/indexCreation.neo4j

Note that the index creation commands are asynchronous and may take a bit of time before they are fully created on the server. Queries executed before the indices are fully created will experience the expected slowdown.

4. Run Workload

In this repository, use the Maven assembly plugin to create a single all encompassing jar:

mvn clean compile assembly:single

This should create a jar that's named something like snb-interactive-neo4j-0.1.0-jar-with-dependencies.jar.

Then, cd into your LDBC driver directory and run the driver with this jar in the classpath. Neo4jDb requires only two configuration parameters, host and port, which are the host IP address and port of the Neo4j webserver, respectively. If left unspecified, these will default to 127.0.0.1 and 7474.

java -cp target/jeeves-0.3-SNAPSHOT.jar:/path/to/this/repo/target/snb-interactive-neo4j-0.1.0-jar-with-dependencies.jar com.ldbc.driver.Client -P configuration/ldbc_driver_default.properties -P configuration/ldbc/snb/interactive/ldbc_snb_interactive_SF-0001.properties -P /path/to/social_network/updateStream.properties -p host 192.168.1.101 -p port 7474

It is assumed that the user is already familiar with how to modify configuration/ldbc_driver_default.properties and configuration/ldbc/snb/interactive/ldbc_snb_interactive_SF_XXXX.properties configuration files appropriately for their workload.

Extras

Test Individual Queries

To run individual queries use the QueryTester utility included in this repo:

QueryTester: A utility for running individual queries for testing purposes.

Usage:
  QueryTester [options] query1 <personId> <firstName> <limit>
  QueryTester [options] query2 <personId> <maxDate> <limit>
  QueryTester [options] query3 <personId> <countryXName> <countryYName> <startDate> <durationDays> <limit>
  QueryTester [options] query4 <personId> <startDate> <durationDays> <limit>
  QueryTester [options] query5 <personId> <minDate> <limit>
  QueryTester [options] query6 <personId> <tagName> <limit>
  QueryTester [options] query7 <personId> <limit>
  QueryTester [options] query8 <personId> <limit>
  QueryTester [options] query9 <personId> <maxDate> <limit>
  QueryTester [options] query10 <personId> <month> <limit>
  QueryTester [options] query11 <personId> <countryName> <workFromYear> <limit>
  QueryTester [options] query12 <personId> <tagClassName> <limit>
  QueryTester [options] query13 <person1Id> <person2Id>
  QueryTester [options] query14 <person1Id> <person2Id>
  QueryTester [options] shortquery1 <personId>
  QueryTester [options] shortquery2 <personId> <limit>
  QueryTester [options] shortquery3 <personId>
  QueryTester [options] shortquery4 <messageId>
  QueryTester [options] shortquery5 <messageId>
  QueryTester [options] shortquery6 <messageId>
  QueryTester [options] shortquery7 <messageId>
  QueryTester [options] update1 <nth>
  QueryTester [options] update2 <nth>
  QueryTester [options] update3 <nth>
  QueryTester [options] update4 <nth>
  QueryTester [options] update5 <nth>
  QueryTester [options] update6 <nth>
  QueryTester [options] update7 <nth>
  QueryTester [options] update8 <nth>
  QueryTester (-h | --help)
  QueryTester --version

Options:
  --config=<file>      QueryTester configuration file
                       [default: ./config/querytester.properties].
  --repeat=<n>         How many times to repeat the query. If n > 1
                       then normal query result output will be
                       surpressed to show only the query timing
                       information
                       [default: 1].
  --input=<input>      Directory of updateStream files to use as
                       input for update queries (the nth update of
                       its kind will be selected from the stream to
                       execute) [default: ./].
  --timeUnits=<unit>   Unit of time in which to report timings
                       (SECONDS, MILLISECONDS, MICROSECONDS,
                       NANOSECONDS) [default: MILLISECONDS].
  -h --help            Show this screen.
  --version            Show version.

Here is a usage example:

mvn exec:java
-Dexec.mainClass="net.ellitron.ldbcsnbimpls.interactive.neo4j.util.QueryTester"
-Dexec.args="shortquery1 933"

Query:
LdbcShortQuery1PersonProfile{personId=933}

Query Stats:
  Units:            MILLISECONDS
  Count:            1
  Min:              256
  Max:              256
  Mean:             256
  25th Percentile:  256
  50th Percentile:  256
  75th Percentile:  256
  90th Percentile:  256
  95th Percentile:  256
  99th Percentile:  256

LdbcShortQuery1PersonProfileResult{firstName='Mahinda', lastName='Perera', birthday=628732800000, locationIp='192.248.2.123', browserUsed='Firefox', cityId=1359, gender='male', creationDate=1268868730447}

mvn exec:java
-Dexec.mainClass="net.ellitron.ldbcsnbimpls.interactive.neo4j.util.QueryTester"
-Dexec.args="--repeat 1000 --timeUnits MICROSECONDS shortquery1 933"

Query:
LdbcShortQuery1PersonProfile{personId=933}

Query Stats:
  Units:            MICROSECONDS
  Count:            1000
  Min:              611
  Max:              341849
  Mean:             1256
  25th Percentile:  720
  50th Percentile:  809
  75th Percentile:  991
  90th Percentile:  1293
  95th Percentile:  1555
  99th Percentile:  2097


mvn exec:java
-Dexec.mainClass="net.ellitron.ldbcsnbimpls.interactive.neo4j.util.QueryTester"
-Dexec.args="--input /home/jdellit/git/ldbc_snb_datagen/social_network update1 7"

Query:
LdbcUpdate1AddPerson{personId=17592186052198, personFirstName='Mikhail', personLastName='Basov', gender='male', birthday=Fri Jan 16 16:00:00 PST 1981, creationDate=Sun Oct 21 02:03:48 PDT 2012, locationIp='31.28.124.76', browserUsed='Firefox', cityId=843, languages=[ru, en], emails=[], tagIds=[58, 282, 288, 468, 777, 779, 780, 797, 809, 973, 974, 1153, 1185, 1201, 1533, 1615, 1676, 1733, 1758, 1765, 1769, 1984, 1993, 2011, 2092, 2104, 2780, 2786, 2787, 2792, 2832, 2841, 2845, 2848, 2849, 2859, 2865, 2874, 2887, 2902, 2907, 2939, 2940, 2990, 3001, 3065, 3076, 3086, 3109, 4878, 4952, 5078, 6168, 6981, 7804, 9319, 9448, 14155, 14826], studyAt=[Organization{organizationId=6004, year=2002}], workAt=[Organization{organizationId=1043, year=2003}, Organization{organizationId=1096, year=2002}, Organization{organizationId=1102, year=2002}]}

Query Stats:
  Units:            MILLISECONDS
  Count:            1
  Min:              370
  Max:              370
  Mean:             370
  25th Percentile:  370
  50th Percentile:  370
  75th Percentile:  370
  90th Percentile:  370
  95th Percentile:  370
  99th Percentile:  370

com.ldbc.driver.workloads.ldbc.snb.interactive.LdbcNoResult@241e8b0a

Validation

To run validation, I used the Neo4j files included in this repository. Specifically, use the DataFormatConverter on the non-merged, string-formatted date version of the dataset:

mvn exec:java -Dexec.mainClass="net.ellitron.ldbcsnbimpls.interactive.neo4j.util.DataFormatConverter" -Dexec.args="/path/to/ldbc_snb_interactive_validation/neo4j/neo4j--validation_set/social_network/string_date/ /path/to/outputDir/"

Then run validation as follows, being sure to modify any configuration settings in ldbc_snb_interactive_validation/neo4j/readwrite_neo4j--ldbc_driver_config--db_validation.properties as necessary:

java -cp target/jeeves-0.3-SNAPSHOT.jar:/path/to/this/repo/target/snb-interactive-neo4j-0.1.0-jar-with-dependencies.jar com.ldbc.driver.Client -P /path/to/ldbc_snb_interactive_validation/neo4j/readwrite_neo4j--ldbc_driver_config--db_validation.properties