-
Notifications
You must be signed in to change notification settings - Fork 40.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
priority & fairness: support dynamic max seats #118601
priority & fairness: support dynamic max seats #118601
Conversation
/retest |
365059d
to
a0e487d
Compare
if maximumSeats == 0 || maximumSeats > maximumSeatsUpperBound { | ||
maximumSeats = maximumSeatsUpperBound | ||
} | ||
|
||
return &WorkEstimatorConfig{ | ||
MinimumSeats: minimumSeats, | ||
MaximumSeats: maximumSeats, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Debating if it's worthwhile to have separate min/max seat configuration for ListWorkEstimatorConfig and MutatingWorkEstimatorConfig -- this way this change only applies for LIST estimator which we know is more problematic at the moment. It's possible mutating work estimator has similar issues but we just don't have the feedback yet since it was introduced in a newer version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leaning towards separating the config and revisting until we have more feedback on watches.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I lean the other way. The problem affects both. I would not do the work to separate them unless we have a specific need to.
/assgin @wojtek-t @MikeSpreitzer |
/assign @wojtek-t @MikeSpreitzer |
func DefaultWorkEstimatorConfig() *WorkEstimatorConfig { | ||
func DefaultWorkEstimatorConfig(serverConcurrencyLimit int) *WorkEstimatorConfig { | ||
// By default maximumSeats is 1% of total max inflight requests with an upper bound of 10 | ||
maximumSeats := uint64(math.Ceil((float64(serverConcurrencyLimit) * maximumSeatsFractionOfTotal))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While I understand the motivation behind this change (and I guess I agree with it), I'm not convinced the change as of now will not bring more damage then gain...
-
We are running smallest control-planes with max-inflights < 100 - this change automatically means that every single request will occupy 1 seat - is that really what we want?
-
We run large control-planes (in particular scalability tests) with < 1000 inflights, so we will never reach 10 max-seats for them. I'm fairly sure it's not really what we want.
So I guess I have two main concerns about this change:
1%
seems too low to me- A second question is whether we want this to set that based on total max-seats, or rather per-priority-level (where per-priority-level in the borrowing world I rather mean sth like "max possible in-flights for a given priority assuming I borrow everything I can" (or sth like that - so that it doesn't change frequently).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are running smallest control-planes with max-inflights < 100 - this change automatically means that every single request will occupy 1 seat - is that really what we want?
This is the idea, but maybe 1% is too aggressive. I have an assumption here that most small control planes are not really using up to 10 seats anyways because they don't have a lot of objects stored, and this change will only really impact the medium-large clusters that are running hundreds or thousands of Pods.
But for example sake, let's say we have a small control plane with --max-requests-inflight=100
and 500 pods, a single LIST would use up all seats in priority levels like global-default
(assuming it has 10% of total seats), which I don't think we want either. Even looking at workload-low
(assuming 40% of total seats) -- it would only take 4 daemonsets listing pods with a field selector to use all of workload-low
. Even with borrowing, I'm skeptical it will alleviate seat capacity much.
We run large control-planes (in particular scalability tests) with < 1000 inflights, so we will never reach 10 max-seats for them. I'm fairly sure it's not really what we want.
Actually, could we run this scalability test on this PR? Yes, max seats will never reach 10, but it will still be single digit value based on what is used in scalability tests, I'm curious if the impact will be noticeable if max seats is reduced from 10 to 7 (as an example).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, could we run this scalability test on this PR? Yes, max seats will never reach 10, but it will still be single digit value based on what is used in scalability tests, I'm curious if the impact will be noticeable if max seats is reduced from 10 to 7 (as an example).
Unfortunately, scalability test is not doing a good enough job in saturating control-plane so i don't think we will have useful data from it...
Now thinking more about my concerns:
- Let's say someone sets inflight to 50 - then 4% has exactly the same problem. And if we bump it too high, then it will become uneffective.
So maybe my concerns boils down to the fact that a single value is not what we want.
We may need a function that takestotal inflights
as a parameter and returns a percent?
For low numbers it returns something higher, for higher numbers is decreases.
That said, I don't have a good suggestion how to create this function in more data-driven way...
But for example sake, let's say we have a small control plane with --max-requests-inflight=100 and 500 pods
That's unclear to me - listing 500 pods is quite a bit of work too.
But I can definitely imagine that this value:
https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/apiserver/pkg/util/flowcontrol/request/config.go#L28
when we run better tests, is just too low...
So yet another option would be to also compute this number based on max-requests-inflight
.
- For that, I agree with Mike that my concern is much more related to the issue about the fact that max-width can exceed the allocated capacity of one priority level.
I guess I'm fine of having that discussion separately from this PR.
/hold To have some discussion on it. |
I agree that 1% seems too low. I would use the total server capacity, which is max-mutating + max-readonly. If we were to use 17% and round then that would give the same behavior as today in the default configuration (200 mutating + 400 readonly). We already have an issue about the fact that max-width can exceed the allocated capacity of one priority level. Let's address that fully and head-on, as a separate issue. |
Oh, wait, I was off by a factor of 10. It would be 1.7%. Maybe use millis instead of percent? Or 2% and accep the change to 12? |
/triage accepted |
@MikeSpreitzer - I think 2% is still too low (unless we make it configurable). |
We could set |
…ax seats Max seats from prioriy & fairness work estimator is now min(0.15 x nominalCL, nominalCL/handSize) 'Max seats' calculated by work estimator is currently hard coded to 10. When using lower values for --max-requests-inflight, a single LIST request taking up 10 seats could end up using all if not most seats in the priority level. This change updates the default work estimator config such that 'max seats' is at most 10% of the maximum concurrency limit for a priority level, with an upper limit of 10. This ensures seats taken from LIST request is proportional to the total available seats. Signed-off-by: Andrew Sy Kim <andrewsy@google.com>
2373a7f
to
d3ef2d4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
Thanks!
LGTM label has been added. Git tree hash: 37313f51268123ef1ccb19ddb23c61e8d993b705
|
/assign @deads2k |
/lgtm /hold cancel Thanks! |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: andrewsykim, MikeSpreitzer, wojtek-t The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
After discussing it, we consider it more of a bug fix than feature. We will be backporting it so re-tagging. /kind bug |
Thanks for retagging -- I also updated the release notes to reflect this |
…#118601-origin-release-1.25 Automated cherry pick of #118601: priority & fairness: support dynamic max seats
…#118601-origin-release-1.27 Automated cherry pick of #118601: priority & fairness: support dynamic max seats
…#118601-origin-release-1.26 Automated cherry pick of #118601: priority & fairness: support dynamic max seats
This is a long shot, but I'm wondering if this change is related to some problems we've been having since moving from k8s A number of times since the upgrade we have had the As far as I can tell we are not hitting rate limits on our kube-apiservers and the control-plane nodes have plenty of spare CPU and RAM. We also have the
Using queries like The cluster has roughly 35 modes running around 750 pods. Just looking through the changes to 1.27.5 this was one of them and is playing with the requests to the kube-apiserver so it's worth asking. |
@jim-barber-he I have a hard time seeing how this change can impact liveness probe of apiserver, but anything is possible haha. Can you open a separate issue with the full apiserver logs (not just on the event rate limiter)? That might include more details on why the livenss probe fails |
Thanks. |
What type of PR is this?
/kind feature
What this PR does / why we need it:
'Max seats' calculated by work estimator is currently hard coded to 10. When using lower values for --max-requests-inflight, a single LIST request taking up 10 seats could end up using all if not most seats in the priority level. This change updates the work estimator such that 'max seats' is now calcuated as
MIN(0.15 x nominalCL, nomincalCL / handSize)
, with an upper limit of 10. This ensures seats taken from any request is proportional to the total available seats.Which issue(s) this PR fixes:
ref #115409
Special notes for your reviewer:
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: