-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make sure volcano schduler cache synced before first scheduling by waiting for handlers sync. #3177
base: master
Are you sure you want to change the base?
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Is it same with wait for all items handled before start to scheduling #2822? |
It looks like the two are similar. If the other one is merged, I will close this pr. |
Is this still relevant? If so, what is blocking it? Is there anything you can do to help move it forward? This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. |
We'd better include this feature in new release. @Monokaix @william-wang |
Please rebase your pr: ) |
I will rebase the code as soon as possible |
52ce8c6
to
daeb06f
Compare
@lowang-bh The code rebase is done. Could you review these codes, please? |
pkg/scheduler/scheduler.go
Outdated
@@ -88,7 +88,10 @@ func (pc *Scheduler) Run(stopCh <-chan struct{}) { | |||
pc.cache.SetMetricsConf(pc.metricsConf) | |||
pc.cache.Run(stopCh) | |||
pc.cache.WaitForCacheSync(stopCh) | |||
klog.V(2).Infof("Scheduler completes Initialization and start to run") | |||
if err := pc.cache.WaitForHandlersSync(stopCh); err != nil { | |||
panic(fmt.Sprintf("failed to wait for handlers sync: %v", err)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we don't panic and go on, what will happen? Is that acceptable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only way that error is returned is if stopCh cancel with ErrWaitTimeout. So we can make the method WaitForHandlersSync returnless and remove this panic.
@@ -88,7 +88,10 @@ func (pc *Scheduler) Run(stopCh <-chan struct{}) { | |||
pc.cache.SetMetricsConf(pc.metricsConf) | |||
pc.cache.Run(stopCh) | |||
pc.cache.WaitForCacheSync(stopCh) | |||
klog.V(2).Infof("Scheduler completes Initialization and start to run") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can add a log before custom handler sync, so that user can check the latancy from log.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Of course, I will add logs about the waiting cache initialized.
Signed-off-by: RamezesDong <donghouze666@outlook.com>
daeb06f
to
0810709
Compare
Does volcano controller also need catch this? |
I don't think so. The controller just needs to make sure it's eventual consistency, waiting for cache sync is enough |
@RamezesDong: PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Is this still relevant? If so, what is blocking it? Is there anything you can do to help move it forward? This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. |
Ⅰ. Describe what this PR does
The issue kubernetes/kubernetes#116717 mentions the bug that event handlers hadn't handled all events when informer cache synced. This can lead to a terrible result, which is that the scheduler starts scheduling in the wrong state. The K8s community itself has fixed this issue kubernetes/kubernetes#116729.
The PR makes sure handlers have finished syncing before the scheduling cycles start, just like the default scheduler does.