Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KEP-4977: Dedicated Priority Level For Event Requests #4978

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

linxiulei
Copy link

  • One-line PR description: Dedicated Priority Level For Event Requests
  • Other comments:

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Nov 22, 2024
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: linxiulei
Once this PR has been reviewed and has the lgtm label, please assign deads2k for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Nov 22, 2024
@linxiulei
Copy link
Author

/cc @MikeSpreitzer @tkashem @wojtek-t

MikeSpreitzer added a commit to MikeSpreitzer/kubernetes that referenced this pull request Nov 26, 2024
Trying out kubernetes/enhancements#4978

Signed-off-by: Mike Spreitzer <mspreitz@us.ibm.com>
MikeSpreitzer added a commit to MikeSpreitzer/kubernetes that referenced this pull request Nov 26, 2024
Trying out kubernetes/enhancements#4978

Signed-off-by: Mike Spreitzer <mspreitz@us.ibm.com>
| Name | Proposed | Proposed | Proposed Borrowing | Proposed Guaranteed |
: : Nominal Shares : Lendable : Limit : Shares :
| ----- | -------------: | -------: | -----------------: | ------------------: |
| event | 5 | 0% | 100% | 5 |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we take a cluster (let's even say a scalability test) - what is a typical percentage of seats that events consume?
I would like to avoid a situation that we suddenly start visibly throttling events without good reason...

FWIW @serathius created an internal GKE dashboard for debugging cases like that, so he might be able to help with answering this question

Copy link
Contributor

@serathius serathius Nov 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't have a number on hand for a scalability test, but I got 4% APF used for events from aggregation of multiple tests we run.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The aggregation of different test sounds even better - thanks!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So following up on it. Currently, the sum of all shares in default configuration across all PLs is 245, with this change it will be bumped to 250. With 5 shares by default and 100% borrowing limit we get a max of 10.
This effectively mean 10 out of 250, which is exactly 4%.

If we say that in tests we see 4% used for events across aggregated tests, I believe we need to set the upper bound higher (probably by increasing the borrowing limit?)

@MikeSpreitzer - for your thoughts too

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @wojtek-t

This might be a good place to talk about core concepts and how they relate.
-->

### Risks and Mitigations
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And the comment that I brought up in some different place we were discussing it.

I would like to understand how we're going to safely roll this out, given we will effectively change APF configuration for (almost) all clusters in the world.

Signed-off-by: Eric Lin <exlin@google.com>
@k8s-ci-robot k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Nov 26, 2024

This KEP proposes the following settings:

| Name | Proposed Nominal Shares | Proposed Lendable | Proposed Borrowing Limit | Proposed Guaranteed Shares |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some attributes omitted here. You might compare how I filled them in in kubernetes/kubernetes#128974

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean the weight and queue?

I can add weight if you believe the weighted borrow will be landed before this KEP, which I am more than happy to see. As for queue, I am inclined to make it reject right away given that the volume of events may easily overwhelm any queue size we will specify here.

MikeSpreitzer added a commit to MikeSpreitzer/kubernetes that referenced this pull request Nov 27, 2024
Trying out kubernetes/enhancements#4978

Signed-off-by: Mike Spreitzer <mspreitz@us.ibm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants