-
Notifications
You must be signed in to change notification settings - Fork 40k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid moving pods out of unschedulable status unconditionally #94009
Comments
@ahg-g: GitHub didn't allow me to assign the following users: adtac. Note that only kubernetes members, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/assign |
/cc @denkensk I'm thinking of if we can amplify the scope: which is to say, not only enable/disable moving (all) pods, but also provides mechanics to let plugins notify the scheduler framework which particular pods should be moved. This would benefits in @denkensk 's coscheduling implementation to move the preceding (N-1) pods back to activeQ when the Nth pod comes - which satisfies the "minAvaialbe" requirement. Moreover, this would help in getting rid of flushing |
The suggestion is not to enable/disable moving all pods in all events, this is only true for the Service events. For the other two, node and volumes, we will test each pod and move the ones that pass the condition. I agree that our proposal of integrating this with filters/permit plugins is more generic, but it adds a lot of complexity and potential for bugs, so we need to be careful with that I think. |
One approach could be to start simple, validate and establish a reference point in terms of perf improvements, and then migrate to something more generic. The more generic approach will not be more performant than what is proposed in this issue. |
Yes, filters/permit runs in the critical path, and move Pods isn't a lock-free operation. What I want to say is during the design/implementation of this proposal, keep it in mind that plugins may need to actively "suggest" moving particular Pods back to activeQ.
Agree. |
Right, we already designed the interface for such a potential eventuality: https://docs.google.com/document/d/1Dw1qPi4eryllSv0F419sKVbGXiPvSv6N_mJd6AVSg74/edit. |
just fixing double "moving" in title: |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
/remove-lifecycle stale |
/assign |
/cc |
Yes, to achieve that, every plugin has to implement EventsToRegister()... BTW: one way is work it around is to add some temporary logic on framework side - skip movingPods upon Service events if ServiceAffinity is not enabled. And remove this temporary logic when all plugins opt-in EventsToRegister() in the next release. |
That wouldn't qualify for codefreeze though. |
Nvm then :) |
We should simply not register for service updates at all if ServiceAffinity is not configured, which is the default. |
Probably not a great idea to totally disable the registration, as an out-of-tree plugin may choose to respond to Service events. |
We need to find a way for external plugins to register events because we will remove this registration once the plugin is deprecated. |
Yes, also should take CRD into consideration. Probably dynamic informer. |
I still think we should disabled Service event handlers when the plugin is not registered. If an external plugin is using it, then they could enable the service affinity filter plugin with an empty affinity labels or patch event handlers until we figure out a way for external plugins to register extra events. btw, do you know of any external plugins that actually rely on service events? |
That's fair, and I can work on this, adding some comments caveating this is a temporary workaround.
I'm not aware so far. |
tbh, I would say plugins depending on services is an antipattern, given the existence of pod affinity. Probably some CRDs might make sense for specialized plugins. |
@Huang-Wei we should add metrics to track the efficiency of the solution. The metric would track the number of pods that were not put back into the queue broken down by plugin and even type for example. |
SG. I compiled a list to include on-going and to-do items here #100347. |
@yuzhiquan Are you still working on this? |
@Huang-Wei did we have PRs for number 2 and number 3? |
I am asking about the two points from the four mentioned in the description of this issue: #94009 (comment) |
Oops, misunderstanding, ignore me. |
|
ah, yeah, I knew we worked on that in the previous release, lol. Thanks. |
Currently we unconditionally move pods out of unschedulable status when an event happens. However, we can optimize this by setting some conditions for each event to avoid unnecessarily doing that:
This is necessary to avoid wasting scheduling cycles.
Related proposal: https://docs.google.com/document/d/1Dw1qPi4eryllSv0F419sKVbGXiPvSv6N_mJd6AVSg74/edit
/sig scheduling
/assign @adtac
The text was updated successfully, but these errors were encountered: