Performant Ratelimiter

I wanted to play around with writing performant Rust code, so I decided to try implementing a dummy ratelimiting middleware.

TL;DR

Checkout the results.

Version comparisons

RateLimiter Version 0 - RwLock HashMap with VecDeque values

The first iteration of RateLimiter uses the following structure.

pub struct RateLimiter0 {
    requests: RwLock<HashMap<IpAddr, VecDeque<DateTime<Utc>>>>,
}

Key Characteristics:

Only uses the standard library, no external crates for data structures.
Ratelimit0 Method: The ratelimit0 function implements a rate-limiting mechanism based on a given source IP and timestamp. It first computes a cutoff_time to determine the relevancy of requests. Upon acquiring a write lock on the shared requests map, it retrieves (or initializes if non-existent) a queue of timestamps associated with the source IP. It then iterates through this queue, removing any timestamps older than the cutoff_time. If the length of the filtered queue surpasses a predefined maximum (i.e., MAX_REQUESTS), the function returns false, indicating that the rate limit has been exceeded; otherwise, it adds the new timestamp to the queue and returns true. This method is designed to be thread-safe by ensuring mutual exclusion using an RwLock around the entire HashMap.

RateLimiter Version 1 - No locks, SkipMap with VecDeque values

This iteration of RateLimiter uses the following structure:

pub struct RateLimiter1 {
    requests: SkipMap<IpAddr, VecDeque<DateTime<Utc>>>,
}

Key Characteristics:

Data Structure: It stores the requests directly as VecDeque<DateTime<Utc>> without any locking mechanism.
Ratelimit1 Method: The method ratelimit operates on this simple structure. It fetches the current request queue (or initializes a new one if none exists), then trims old requests, checks if the current request is within rate limits, and updates the map with the new request queue. However, this method is not thread-safe and thus the VecDeque is prone to race conditioncs.

Note that race conditions do not violate Rust’s memory safety rules. A race between multiple threads can never cause memory errors or segfaults. A race condition is a logic error in its entirety.

RateLimiter Version 2 - SkipMap with RwLock'd VecDeques

The second version of RateLimiter introduces some modifications:

pub struct RateLimiter2 {
    requests: SkipMap<IpAddr, RwLock<VecDeque<DateTime<Utc>>>>,
}

Key Characteristics:

Data Structure: It introduces an RwLock to make the data structure thread-safe, eliminating race conditions.
Ratelimit2 Method: It fetches or initializes the request queue and then, within a write lock, trims old requests and checks the current request against the rate limits.

RateLimiter Version 3 - SkipMap with ArrayQueue values

pub struct RateLimiter3 {
    requests: SkipMap<IpAddr, ArrayQueue<DateTime<Utc>>>,
}

Key Characteristics:

Data Structure: It uses an ArrayQueue instead of a VecDeque as a thread-safe data structure, eliminating race conditions.

Benchmarks

I used criterion for benchmarking the performance, and pprof + flamegraph for profiling the benches.

Methodology

Random IP Generation: Simulates diverse source IP addresses using a random_ip() function that produces random IPv4 addresses.
Requests: Each test simulates a million requests.
Chunking: Requests are processed in chunks of 1,000 at a time, leveraging the benefits of parallel processing.
Concurrency: Utilizes the tokio::runtime::Builder::new_multi_thread() to process requests concurrently, maximizing the utilization of available CPU cores. This done to simulate an actual web server.
Measurement Duration: Each benchmark iteration runs for a duration of 45 seconds to ensure that enough samples are collected.
Repetitions: Benchmarks are repeated 10 times to account for variations and to ensure consistent results.
Error Handling: The code expects all tasks to complete successfully. If any of the tasks fail, the benchmark will terminate with an error.

Hardware employed

These benchmarks were ran on my local machine with the following specs:

Property	Value
Kernel	5.15.90.1-microsoft-standard-WSL2
CPU	AMD Ryzen 7 3800X (4) @ 3.900GHz
Memory	15998MiB

Results

The results can be found here.

Conclusion

Version 0 was the fastest, due to the HashMap's underlying hashtable data structure which has a time complexity of O(1) for the average case for most operations (see table below). The contention for the RwLock was not an issue in this case. Should the amount of time per request increase, then the contention for the RwLock would become more of an issue, and the SkipMap would likely become the better choice.

Operation/Characteristic	`std::collections::HashMap`	`crossbeam::SkipMap`
Underlying Structure	Hash table	Skip list
Insertion	O(1) avg, O(n) worst	O(log n) avg
Deletion	O(1) avg, O(n) worst	O(log n) avg
Lookup	O(1) avg, O(n) worst	O(log n) avg
Iterating	O(n) (arbitrary order)	O(n) (ascending order)
Concurrency	No (external synchronization)	Yes (lock-free)

Running the benchmarks

To run the benchmarks yourself, clone the repo and run:

cargo bench --bench ratelimit_benchmark -- --profile-time=45

The index.html file for the benchmarks will be created at target/criterion/report/index.html
The flamegraph of the benchmark will be created at target/criterion/<name-of-benchmark>/profile/flamegraph.svg

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.cargo		.cargo
.github/workflows		.github/workflows
benches		benches
src		src
target/criterion		target/criterion
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
README.md		README.md
justfile		justfile

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Performant Ratelimiter

TL;DR

Version comparisons

RateLimiter Version 0 - RwLock HashMap with VecDeque values

RateLimiter Version 1 - No locks, SkipMap with VecDeque values

RateLimiter Version 2 - SkipMap with RwLock'd VecDeques

RateLimiter Version 3 - SkipMap with ArrayQueue values

Benchmarks

Methodology

Hardware employed

Results

Conclusion

Running the benchmarks

About

Releases

Packages

Languages

liamwh/performant-ratelimiter

Folders and files

Latest commit

History

Repository files navigation

Performant Ratelimiter

TL;DR

Version comparisons

RateLimiter Version 0 - RwLock HashMap with VecDeque values

RateLimiter Version 1 - No locks, SkipMap with VecDeque values

RateLimiter Version 2 - SkipMap with RwLock'd VecDeques

RateLimiter Version 3 - SkipMap with ArrayQueue values

Benchmarks

Methodology

Hardware employed

Results

Conclusion

Running the benchmarks

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages