-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rate limiting for shard operations #5582
Conversation
Nice! I think the ~1000 extra are because we start with a capacity of 1000 and are replenishing in real time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! 🙏
Left some suggestions. Don't take all of them too serious 😄
lib/segment/src/types.rs
Outdated
/// Max number of read operations per second | ||
#[serde(skip_serializing_if = "Option::is_none")] | ||
pub read_rate_limit_per_sec: Option<usize>, | ||
|
||
/// Max number of write operations per second | ||
#[serde(skip_serializing_if = "Option::is_none")] | ||
pub write_rate_limit_per_sec: Option<usize>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure about this one, but I think it may be useful to be a bit more explicit on this one:
/// Max number of read operations per second | |
#[serde(skip_serializing_if = "Option::is_none")] | |
pub read_rate_limit_per_sec: Option<usize>, | |
/// Max number of write operations per second | |
#[serde(skip_serializing_if = "Option::is_none")] | |
pub write_rate_limit_per_sec: Option<usize>, | |
/// Max number of read operations per second per shard per peer | |
#[serde(skip_serializing_if = "Option::is_none")] | |
pub read_rate_limit_per_sec: Option<usize>, | |
/// Max number of write operations per second per shard per peer | |
#[serde(skip_serializing_if = "Option::is_none")] | |
pub write_rate_limit_per_sec: Option<usize>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On another thought per replica
may be shorter to per shard per peer
pub async fn on_strict_mode_config_update(&self) { | ||
let config = self.collection_config.read().await; | ||
|
||
if let Some(strict_mode_config) = &config.strict_mode_config { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be worth it to create a function that builds and returns the new RateLimiter
s. We'd be able to use it in the constructor above, and here, sharing code.
tests/openapi/test_strictmode.py
Outdated
failed = True | ||
break | ||
|
||
assert failed, "Rate limiting did not work" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we make it a bit more strict and make failed
a counter? And assert we've failed at least 5 (50%) times?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nothing to add to @timvisee's points
c263278
to
66f362f
Compare
66f362f
to
d6ceef2
Compare
@@ -1432,6 +1432,8 @@ Note: 1kB = 1 vector of size 256. | | |||
| search_max_oversampling | [float](#float) | optional | | | |||
| upsert_max_batchsize | [uint64](#uint64) | optional | | | |||
| max_collection_vector_size_bytes | [uint64](#uint64) | optional | | | |||
| read_rate_limit_per_sec | [uint32](#uint32) | optional | | | |||
| write_rate_limit_per_sec | [uint32](#uint32) | optional | | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rate limit per second won't allow us to configure bursts. Could we switch to requests per minute?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Internally the rate limiter computes the number of requests allowed per second.
let tokens_per_sec = rate.requests_num() as f64 / rate.period().as_secs_f64();
Having an input of 10 req/s
is equivalent to 600 req/m
.
Those tokens are replenished in real time based on the elapsed time since the last check.
We do start at full capacity to allow burst on start.
Here is the impl:
pub fn check(&mut self) -> bool {
let now = Instant::now();
let elapsed = now.duration_since(self.last_check);
self.last_check = now;
// Refill tokens based on elapsed time.
self.tokens += self.tokens_per_sec * elapsed.as_secs_f64();
if self.tokens > self.capacity as f64 {
self.tokens = self.capacity as f64;
}
if self.tokens >= 1.0 {
self.tokens -= 1.0; // Consume one token.
true // Request allowed.
} else {
false // Request denied.
}
}
Which kind of burst scenario are you concerned about?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed with Andrey regarding his concerns, he wants to be able to have an actual burst capacity of one minute worth of tokens and not one second.
I will merge this PR as it is because it is tidy and has been reviewed.
Then propose the adjusted implementation in a new PR for clarity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ToDO in other PR: switch to per-minute
Here is the followup #5597 |
* Rate limiting for shard operations * address all review comments in one go
This PR enables rate limiting through strict mode for read and write requests.
Integration tests are present to validate the limit kicks in with the correct error message.
I validated the precision of the limit with an end to end example for 1000 search req/s.
Shoot with oha with 10 concurrent workers for 1 minutes
oha -m POST \ -d "{ \"vector\": [0.2, 0.1, 0.9, 0.7], \"limit\": 4 }" \ -T application/json \ -A application/json \ -z 1m \ -c 10 \ http://127.0.0.1:6333/collections/benchmark/points/search
Expected results
1000 rps * 60s = 60k
valid responses per minute.