Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rate limiting for shard operations #5582

Merged
merged 2 commits into from
Dec 6, 2024
Merged

Rate limiting for shard operations #5582

merged 2 commits into from
Dec 6, 2024

Conversation

agourlay
Copy link
Member

@agourlay agourlay commented Dec 4, 2024

This PR enables rate limiting through strict mode for read and write requests.

Integration tests are present to validate the limit kicks in with the correct error message.

I validated the precision of the limit with an end to end example for 1000 search req/s.

PUT collections/benchmark
{
  "vectors": {
    "size": 4,
    "distance": "Dot"
  },
  "strict_mode_config": {
    "enabled": true,
    "read_rate_limit_per_sec": 1000
  }
}

Shoot with oha with 10 concurrent workers for 1 minutes

oha -m POST  \
 -d "{ \"vector\": [0.2, 0.1, 0.9, 0.7], \"limit\": 4 }" \
 -T application/json \
 -A application/json  \
 -z 1m \
 -c 10 \
 http://127.0.0.1:6333/collections/benchmark/points/search

Expected results 1000 rps * 60s = 60k valid responses per minute.

Status code distribution:
  [200] 60949 responses
  [429] 38885 responses

@agourlay agourlay marked this pull request as ready for review December 4, 2024 13:58
@timvisee timvisee self-requested a review December 5, 2024 14:04
@timvisee
Copy link
Member

timvisee commented Dec 5, 2024

[200] 60949 responses

Nice! I think the ~1000 extra are because we start with a capacity of 1000 and are replenishing in real time.

Copy link
Member

@timvisee timvisee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! 🙏

Left some suggestions. Don't take all of them too serious 😄

Comment on lines 711 to 721
/// Max number of read operations per second
#[serde(skip_serializing_if = "Option::is_none")]
pub read_rate_limit_per_sec: Option<usize>,

/// Max number of write operations per second
#[serde(skip_serializing_if = "Option::is_none")]
pub write_rate_limit_per_sec: Option<usize>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about this one, but I think it may be useful to be a bit more explicit on this one:

Suggested change
/// Max number of read operations per second
#[serde(skip_serializing_if = "Option::is_none")]
pub read_rate_limit_per_sec: Option<usize>,
/// Max number of write operations per second
#[serde(skip_serializing_if = "Option::is_none")]
pub write_rate_limit_per_sec: Option<usize>,
/// Max number of read operations per second per shard per peer
#[serde(skip_serializing_if = "Option::is_none")]
pub read_rate_limit_per_sec: Option<usize>,
/// Max number of write operations per second per shard per peer
#[serde(skip_serializing_if = "Option::is_none")]
pub write_rate_limit_per_sec: Option<usize>,

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On another thought per replica may be shorter to per shard per peer

pub async fn on_strict_mode_config_update(&self) {
let config = self.collection_config.read().await;

if let Some(strict_mode_config) = &config.strict_mode_config {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be worth it to create a function that builds and returns the new RateLimiters. We'd be able to use it in the constructor above, and here, sharing code.

failed = True
break

assert failed, "Rate limiting did not work"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we make it a bit more strict and make failed a counter? And assert we've failed at least 5 (50%) times?

Copy link
Contributor

@JojiiOfficial JojiiOfficial left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing to add to @timvisee's points

@agourlay agourlay force-pushed the rate-limiting-operations branch from c263278 to 66f362f Compare December 6, 2024 07:37
@@ -1432,6 +1432,8 @@ Note: 1kB = 1 vector of size 256. |
| search_max_oversampling | [float](#float) | optional | |
| upsert_max_batchsize | [uint64](#uint64) | optional | |
| max_collection_vector_size_bytes | [uint64](#uint64) | optional | |
| read_rate_limit_per_sec | [uint32](#uint32) | optional | |
| write_rate_limit_per_sec | [uint32](#uint32) | optional | |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rate limit per second won't allow us to configure bursts. Could we switch to requests per minute?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Internally the rate limiter computes the number of requests allowed per second.

let tokens_per_sec = rate.requests_num() as f64 / rate.period().as_secs_f64();

Having an input of 10 req/s is equivalent to 600 req/m.

Those tokens are replenished in real time based on the elapsed time since the last check.
We do start at full capacity to allow burst on start.

Here is the impl:

pub fn check(&mut self) -> bool {
        let now = Instant::now();
        let elapsed = now.duration_since(self.last_check);
        self.last_check = now;

        // Refill tokens based on elapsed time.
        self.tokens += self.tokens_per_sec * elapsed.as_secs_f64();
        if self.tokens > self.capacity as f64 {
            self.tokens = self.capacity as f64;
        }

        if self.tokens >= 1.0 {
            self.tokens -= 1.0; // Consume one token.
            true // Request allowed.
        } else {
            false // Request denied.
        }
    }

Which kind of burst scenario are you concerned about?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed with Andrey regarding his concerns, he wants to be able to have an actual burst capacity of one minute worth of tokens and not one second.

I will merge this PR as it is because it is tidy and has been reviewed.
Then propose the adjusted implementation in a new PR for clarity.

Copy link
Member

@generall generall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ToDO in other PR: switch to per-minute

@agourlay agourlay merged commit 1e21a4a into dev Dec 6, 2024
17 checks passed
@agourlay agourlay deleted the rate-limiting-operations branch December 6, 2024 10:02
@agourlay
Copy link
Member Author

agourlay commented Dec 6, 2024

Here is the followup #5597

timvisee pushed a commit that referenced this pull request Dec 9, 2024
* Rate limiting for shard operations

* address all review comments in one go
@agourlay agourlay added this to the Rate limiting milestone Dec 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants