CuckooMap

This is an implementation experiment of a new style key/value table using a cascade of ever larger cuckoo hash tables and associated cuckoo filters to combine the speed of hash lookups with data locality properties for a hot subset of the data.

This is experimental code.

There are altogether four classes with different properties, all with the same interface:

CuckooMap
- cascade of InternalCuckooMaps, each constant size
- sizes grow exponentially
- hot set is heuristically kept in the early, smaller tables
- Cuckoo filters are optionally used to have a fast path if a key is not in the table
- unique keys
- thread-safe
- keys must be movable and copyable and default constructable and must have an empty() method to indicate an empty value. Default-constructed keys must be empty.
- values must be memcpy-able and must not rely on proper construction and destruction, use POD data!
- one can specify custom size and alignment of the value type
- CuckooMaps are not default constructable, not copyable and not movable. They properly destruct keys stored in the table but do not destruct values.
CuckooMultiMap

As CuckooMap, but:
- based on CuckooMap by deriving the Key class and adding one int32_t
- keys may be repeated
- linear time to find all pairs with a given key
- constant time to find a certain pair
ShardedMap<CuckooMap>

As CuckooMap, but with a configurable number of shards (pairs are distributed amongst the shards according to a hash function on the key).
ShardedMap<CuckooMultiMap>

As CuckooMultiMap, but with a configurable number of shards (pairs are distributed amongst the shards according to a hash function on the key).

The interface basically allows the following operations:

lookup a pair with a given key, returning a Finding object
use an existing Finding object to do another lookup in the same shard, this happens still under the mutex (see below).
insert of a pair
insert a new pair using the mutex in an existing Finding object
remove all pairs with a given key
remove a pair referenced by a Finding object.

Finding objects are returned by value by the lookup method with return value optimization, that is, they are directly built up at the caller's site.

A Finding object has two objectives:

It keeps a mutex on the (shard of) the map, in which the key was found. This mutex is kept as long as the Finding object lives.
It allows modification of the key (only up to hash value changes) and the value in the map.

Furthermore, one can perform the following operations on a Finding object:

int32_t found() returns 0 if no key was found, or otherwise the number of pairs with the given key in the map (at most 1 for unique key maps).
Key* key() returns a pointer to the key of the found pair in the map.
Value* value() returns a pointer to the value of the found pair in the map.
bool next() moves the lookup to the next pair with the same key. Returns false if there is no more such pair and true otherwise.
bool get(int32_t pos) moves the lookup to the pair with the same key, pos needs to be non-negative and less than the number of pairs with the key in the table

Finding objects cannot be copied but can be moved.

Building

We use a couple of third-party submodules for performance testing which are built by our CMAKE script by default, so building is slightly involved. RocksDB suggests having Snappy, Zlib, and BZip2 installed on your system. Provided those dependencies are installed, a script like the following should build our code:

git clone {REPOSITORY} {SRC_DIR}
cd {SRC_DIR}
git submodule init
cd 3rdParty/rocksdb
make static_lib
mkdir {BUILD_DIR}
cd {BUILD_DIR}
cmake {SRC_DIR}
make

Performance Testing

We include a performance testing tool which runs a configurable randomized workload and can compare the performance of CuckooMap against that of std::unordered_map, ArangoDB's AssocUnique, and Facebook's RocksDB. The PerformanceTest executable takes 11 parameters as follows:

mapType: Which type of map to use
- 0: CuckooMap
- 1: std::unordered_map
- 2: AssocUnique
- 3: RocksDB
nOpCount: The target number of operations to run in total; Integer, 0 < opCount
nInitialSize: The initial number of elements to insert; Integer, 0 < nInitialSize
nMaxSize: The maximum number of elements in the set at any given time; Integer, 0 < nMaxSize
nWorking: The size of the 'hot' set; Integer, 0 < nWorking
pInsert: The probability that the chosen operation will be an insert; Double, 0 <= pInsert <= 1
pLookup: The probability that the chosen operation will be a lookup; Double, 0 <= pLookup <= 1
pRemove: The probability that the chosen operation will be a remove; Double, 0 <= pRemove <= 1
pWorking: The probability that a lookup or remove operation will be executed on an element in the 'hot' set; Double, 0 <= pWorking <= 1
pMiss: The probability that a lookup operation will search for an item not in the table; Double, 0 <= pMiss <= 1
seed: The seed for the PRNG; Integer, 0 < seed < 2^64
ramThreshold: Maximum number of elements for in-RAM structures; if expected dataset is larger, bail out with defaults to indicate failure

Proper usage then is as follows:

PerformanceTest [mapType] [nOpCount] [nInitialSize] [nMaxSize] [nWorking] [pInsert] [pLookup] [pRemove] [pWorking] [pMiss] [seed] [ramThreshold]

In order to generate readable results, one should use the scripts included in /performance. Here we provide tools which should make running a variety of comparison tests much easier. To run the tests, simply run the following command inside {BUILD_DIR}/performance:

./RunBattery.sh

The results will then be output to a GitHub-flavored markdown file located at {BUILD_DIR}/performance/results.md. For a given result set, we display the test parameters as well as operation latencies at various percentiles for both std::unordered_map and CuckooMap. A latency column is labeled as UM Xp to denoted the Xth percentile for std::unordered_map and CM Yp to denote the Yth percentile for CuckooMap.

In order to change which tests are run, one needs to edit /performance/battery.csv. This is a CSV file where each line contains the parameters necessary to execute PerformanceTest, omitting useCuckoo and seed (these will automatically be set by /performance/RunBattery.sh as necessary).

When making changes to the code, we suggest running the same battery of tests against the existing version and the updated one and comparing the results ({SRC_DIR}/performance/results.md vs. {BUILD_DIR}/performance/results.md). When commiting changes, please make sure to check in the new performance results as well.

You can find the performance results for the current version here.

Name		Name	Last commit message	Last commit date
Latest commit History 75 Commits
3rdParty		3rdParty
cmake		cmake
include/cuckoomap		include/cuckoomap
performance		performance
tests		tests
.clang-format		.clang-format
.gitignore		.gitignore
.gitmodules		.gitmodules
CMakeLists.txt		CMakeLists.txt
README.md		README.md
plan.txt		plan.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CuckooMap

Building

Performance Testing

About

Releases

Packages

Contributors 4

Languages

neunhoef/CuckooMap

Folders and files

Latest commit

History

Repository files navigation

CuckooMap

Building

Performance Testing

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages