Rate limiting for shard operations #5582

agourlay · 2024-12-04T09:42:30Z

This PR enables rate limiting through strict mode for read and write requests.

Integration tests are present to validate the limit kicks in with the correct error message.

I validated the precision of the limit with an end to end example for 1000 search req/s.

PUT collections/benchmark
{
  "vectors": {
    "size": 4,
    "distance": "Dot"
  },
  "strict_mode_config": {
    "enabled": true,
    "read_rate_limit_per_sec": 1000
  }
}

Shoot with oha with 10 concurrent workers for 1 minutes

oha -m POST  \
 -d "{ \"vector\": [0.2, 0.1, 0.9, 0.7], \"limit\": 4 }" \
 -T application/json \
 -A application/json  \
 -z 1m \
 -c 10 \
 http://127.0.0.1:6333/collections/benchmark/points/search

Expected results 1000 rps * 60s = 60k valid responses per minute.

Status code distribution:
  [200] 60949 responses
  [429] 38885 responses

timvisee · 2024-12-05T15:15:38Z

[200] 60949 responses

Nice! I think the ~1000 extra are because we start with a capacity of 1000 and are replenishing in real time.

timvisee

Thanks! 🙏

Left some suggestions. Don't take all of them too serious 😄

timvisee · 2024-12-05T15:19:34Z

lib/segment/src/types.rs

+    /// Max number of read operations per second
+    #[serde(skip_serializing_if = "Option::is_none")]
+    pub read_rate_limit_per_sec: Option<usize>,
+
+    /// Max number of write operations per second
+    #[serde(skip_serializing_if = "Option::is_none")]
+    pub write_rate_limit_per_sec: Option<usize>,


Not sure about this one, but I think it may be useful to be a bit more explicit on this one:

Suggested change

/// Max number of read operations per second

#[serde(skip_serializing_if = "Option::is_none")]

pub read_rate_limit_per_sec: Option<usize>,

/// Max number of write operations per second

#[serde(skip_serializing_if = "Option::is_none")]

pub write_rate_limit_per_sec: Option<usize>,

/// Max number of read operations per second per shard per peer

#[serde(skip_serializing_if = "Option::is_none")]

pub read_rate_limit_per_sec: Option<usize>,

/// Max number of write operations per second per shard per peer

#[serde(skip_serializing_if = "Option::is_none")]

pub write_rate_limit_per_sec: Option<usize>,

On another thought per replica may be shorter to per shard per peer

timvisee · 2024-12-05T15:21:45Z

lib/collection/src/shards/local_shard/mod.rs

+    pub async fn on_strict_mode_config_update(&self) {
+        let config = self.collection_config.read().await;
+
+        if let Some(strict_mode_config) = &config.strict_mode_config {


Would it be worth it to create a function that builds and returns the new RateLimiters. We'd be able to use it in the constructor above, and here, sharing code.

timvisee · 2024-12-05T15:24:05Z

tests/openapi/test_strictmode.py

+            failed = True
+            break
+
+    assert failed, "Rate limiting did not work"


Shall we make it a bit more strict and make failed a counter? And assert we've failed at least 5 (50%) times?

JojiiOfficial

Nothing to add to @timvisee's points

generall · 2024-12-06T08:52:00Z

docs/grpc/docs.md

@@ -1432,6 +1432,8 @@ Note: 1kB = 1 vector of size 256. |
 | search_max_oversampling | [float](#float) | optional |  |
 | upsert_max_batchsize | [uint64](#uint64) | optional |  |
 | max_collection_vector_size_bytes | [uint64](#uint64) | optional |  |
+| read_rate_limit_per_sec | [uint32](#uint32) | optional |  |
+| write_rate_limit_per_sec | [uint32](#uint32) | optional |  |


rate limit per second won't allow us to configure bursts. Could we switch to requests per minute?

Internally the rate limiter computes the number of requests allowed per second.

let tokens_per_sec = rate.requests_num() as f64 / rate.period().as_secs_f64();

Having an input of 10 req/s is equivalent to 600 req/m.

Those tokens are replenished in real time based on the elapsed time since the last check.
We do start at full capacity to allow burst on start.

Here is the impl:

pub fn check(&mut self) -> bool { let now = Instant::now(); let elapsed = now.duration_since(self.last_check); self.last_check = now; // Refill tokens based on elapsed time. self.tokens += self.tokens_per_sec * elapsed.as_secs_f64(); if self.tokens > self.capacity as f64 { self.tokens = self.capacity as f64; } if self.tokens >= 1.0 { self.tokens -= 1.0; // Consume one token. true // Request allowed. } else { false // Request denied. } }

Which kind of burst scenario are you concerned about?

Discussed with Andrey regarding his concerns, he wants to be able to have an actual burst capacity of one minute worth of tokens and not one second.

I will merge this PR as it is because it is tidy and has been reviewed.
Then propose the adjusted implementation in a new PR for clarity.

generall

ToDO in other PR: switch to per-minute

agourlay · 2024-12-06T12:17:27Z

Here is the followup #5597

* Rate limiting for shard operations * address all review comments in one go

github-actions bot mentioned this pull request Dec 4, 2024

Flaky test payload_index_test::test_keyword_facet #5059

Open

agourlay marked this pull request as ready for review December 4, 2024 13:58

timvisee self-requested a review December 5, 2024 14:04

timvisee approved these changes Dec 5, 2024

View reviewed changes

agourlay requested a review from JojiiOfficial December 5, 2024 15:27

JojiiOfficial approved these changes Dec 5, 2024

View reviewed changes

agourlay force-pushed the rate-limiting-operations branch from c263278 to 66f362f Compare December 6, 2024 07:37

Rate limiting for shard operations

d6ceef2

agourlay force-pushed the rate-limiting-operations branch from 66f362f to d6ceef2 Compare December 6, 2024 08:20

github-actions bot mentioned this pull request Dec 6, 2024

Flaky test byte_storage_quantization_test::test_byte_storage_binary_quantization_hnsw::case_09_nearest_scalar_dot #4348

Open

generall reviewed Dec 6, 2024

View reviewed changes

address all review comments in one go

db0001c

generall approved these changes Dec 6, 2024

View reviewed changes

agourlay merged commit 1e21a4a into dev Dec 6, 2024
17 checks passed

agourlay deleted the rate-limiting-operations branch December 6, 2024 10:02

timvisee pushed a commit that referenced this pull request Dec 9, 2024

Rate limiting for shard operations (#5582)

3c19eea

* Rate limiting for shard operations * address all review comments in one go

agourlay added this to the Rate limiting milestone Dec 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rate limiting for shard operations #5582

Rate limiting for shard operations #5582

agourlay commented Dec 4, 2024 •

edited

Loading

timvisee commented Dec 5, 2024

timvisee left a comment

timvisee Dec 5, 2024

timvisee Dec 6, 2024

timvisee Dec 5, 2024

timvisee Dec 5, 2024

JojiiOfficial left a comment

generall Dec 6, 2024

agourlay Dec 6, 2024

agourlay Dec 6, 2024

generall left a comment

agourlay commented Dec 6, 2024

Rate limiting for shard operations #5582

Rate limiting for shard operations #5582

Conversation

agourlay commented Dec 4, 2024 • edited Loading

timvisee commented Dec 5, 2024

timvisee left a comment

Choose a reason for hiding this comment

timvisee Dec 5, 2024

Choose a reason for hiding this comment

timvisee Dec 6, 2024

Choose a reason for hiding this comment

timvisee Dec 5, 2024

Choose a reason for hiding this comment

timvisee Dec 5, 2024

Choose a reason for hiding this comment

JojiiOfficial left a comment

Choose a reason for hiding this comment

generall Dec 6, 2024

Choose a reason for hiding this comment

agourlay Dec 6, 2024

Choose a reason for hiding this comment

agourlay Dec 6, 2024

Choose a reason for hiding this comment

generall left a comment

Choose a reason for hiding this comment

agourlay commented Dec 6, 2024

agourlay commented Dec 4, 2024 •

edited

Loading