-
Notifications
You must be signed in to change notification settings - Fork 40k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(scheduler): fix incorrect loop logic in MultiPoint to avoid a plugin being loaded multiple times #122068
Conversation
Please note that we're already in Test Freeze for the Fast forwards are scheduled to happen every 6 hours, whereas the most recent run was: Mon Nov 27 09:58:46 UTC 2023. |
This issue is currently awaiting triage. If a SIG or subproject determines this is a relevant issue, they will accept it by applying the The Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Hi @caohe. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/sig scheduling |
/ok-to-test |
Can you add a test case to cover this corner case? |
@@ -526,11 +526,9 @@ func (f *frameworkImpl) expandMultiPointPlugins(logger klog.Logger, profile *con | |||
// - part 3: other plugins (excluded by part 1 & 2) in regular extension point. | |||
newPlugins := reflect.New(reflect.TypeOf(e.slicePtr).Elem()).Elem() | |||
// part 1 | |||
for _, name := range enabledSet.list { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @Huang-Wei
Any purpose of iterating through enabledSet instead of overrodePlugins in #108613?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the purpose of iterating through enabledSet is to make sure the order of enabledSet.delete(name)
is right, or else it will trigger slice bounds out of range panic enabledSet.delete()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Traversing enabledSet
instead of overridePlugins
directly is to allow the plugins in overridePlugins
to maintain their original order in enabledSet
when they were added to the newPlugins
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your information. I think it would be better to iterate over a copy of enabledSet
than iterate over overridePlugins
. After testing, this method can also solve the problem of plugins being loaded multiple times.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@caohe we still need a unit test to cover this, can you add a test case in pkg/scheduler/framework/runtime/framework_test.go#TestNewFrameworkMultiPointExpansion
? After addint a test case, we are able to make sure this bug can be reproduced and fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I'm doing this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any purpose of iterating through enabledSet instead of overrodePlugins in #108613?
It's for reserving the order - as the comments explained.
Can we add a UT for this fix? Reduce the occurrence of similar incidents in the future. |
a41bc5b
to
130b809
Compare
/test pull-kubernetes-e2e-kind-ipv6 |
@caohe: The
The following commands are available to trigger optional jobs:
Use
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/test pull-kubernetes-e2e-kind-ipv6 |
130b809
to
505918e
Compare
@lianghao208 @AxeZhan Hello, I have added a test case named Before this fix, it will fail:
Also, I have printed the got and want results:
We can see from the result, After this fix, it will pass:
|
/test pull-kubernetes-e2e-kind-ipv6 |
…gin being loaded multiple times Signed-off-by: caohe <caohe9603@gmail.com>
505918e
to
1f5738d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @caohe a lot for the catch and fix!
@@ -526,11 +526,9 @@ func (f *frameworkImpl) expandMultiPointPlugins(logger klog.Logger, profile *con | |||
// - part 3: other plugins (excluded by part 1 & 2) in regular extension point. | |||
newPlugins := reflect.New(reflect.TypeOf(e.slicePtr).Elem()).Elem() | |||
// part 1 | |||
for _, name := range enabledSet.list { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any purpose of iterating through enabledSet instead of overrodePlugins in #108613?
It's for reserving the order - as the comments explained.
@@ -526,7 +527,7 @@ func (f *frameworkImpl) expandMultiPointPlugins(logger klog.Logger, profile *con | |||
// - part 3: other plugins (excluded by part 1 & 2) in regular extension point. | |||
newPlugins := reflect.New(reflect.TypeOf(e.slicePtr).Elem()).Elem() | |||
// part 1 | |||
for _, name := range enabledSet.list { | |||
for _, name := range slice.CopyStrings(enabledSet.list) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the root cause is that during orderedset.delete(), the length of the list get changed, so some item missed its chance to be iterated in the outer for loop.
The fix can be snapshotting this list in the outer for loop - as you're proposing here. Could you help check if slice.CopyStrings is available in all supported versions 1.25? ~ 1.28? as we want to cherrypick this fix.
Or, another fix is to make the for loop logic accurate w/o copying/snapshotting the list:
diff --git a/pkg/scheduler/framework/runtime/framework.go b/pkg/scheduler/framework/runtime/framework.go
index 8caa8ec1cb4..f238bcbdfb7 100644
--- a/pkg/scheduler/framework/runtime/framework.go
+++ b/pkg/scheduler/framework/runtime/framework.go
@@ -450,12 +450,10 @@ func (os *orderedSet) has(s string) bool {
return found
}
-func (os *orderedSet) delete(s string) {
- if i, found := os.set[s]; found {
- delete(os.set, s)
- os.list = append(os.list[:i-os.deletionCnt], os.list[i+1-os.deletionCnt:]...)
- os.deletionCnt++
- }
+func (os *orderedSet) delete(i int) {
+ delete(os.set, os.list[i])
+ os.list = append(os.list[:i], os.list[i+1:]...)
+ os.deletionCnt++
}
func (f *frameworkImpl) expandMultiPointPlugins(logger klog.Logger, profile *config.KubeSchedulerProfile, pluginsMap map[string]framework.Plugin) error {
@@ -526,10 +524,12 @@ func (f *frameworkImpl) expandMultiPointPlugins(logger klog.Logger, profile *con
// - part 3: other plugins (excluded by part 1 & 2) in regular extension point.
newPlugins := reflect.New(reflect.TypeOf(e.slicePtr).Elem()).Elem()
// part 1
- for _, name := range enabledSet.list {
+ for i := range enabledSet.list {
+ j := i - enabledSet.deletionCnt
+ name := enabledSet.list[j]
if overridePlugins.has(name) {
newPlugins = reflect.Append(newPlugins, reflect.ValueOf(pluginsMap[name]))
- enabledSet.delete(name)
+ enabledSet.delete(j)
}
}
// part 2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for confirming. Could you help cherrypick it to all supported lower k8s versions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I will cherrypick it after this PR is merged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm |
LGTM label has been added. Git tree hash: 34a59d936255d15bedf820c47958838c0d328cb8
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: caohe, Huang-Wei The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
We may wait until v1.29 release is out. And then cherrypick. |
Please also fill the bug info in the cherry-picked PRs. @caohe |
…8-upstream-release-1.26 Automated cherry pick of #122068: fix(scheduler): fix incorrect loop logic in MultiPoint to avoid a plugin being loaded multiple times
…8-upstream-release-1.28 Automated cherry pick of #122068: fix(scheduler): fix incorrect loop logic in MultiPoint to avoid a plugin being loaded multiple times
…8-upstream-release-1.27 Automated cherry pick of #122068: fix(scheduler): fix incorrect loop logic in MultiPoint to avoid a plugin being loaded multiple times
…8-upstream-release-1.29 Automated cherry pick of #122068: fix(scheduler): fix incorrect loop logic in MultiPoint to avoid a plugin being loaded multiple times
What type of PR is this?
/kind bug
What this PR does / why we need it:
Fix incorrect loop logic in MultiPoint to avoid a plugin being loaded multiple times.
Which issue(s) this PR fixes:
Fixes #122066
Special notes for your reviewer:
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: