Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(scheduler): fix incorrect loop logic in MultiPoint to avoid a plugin being loaded multiple times #122068

Merged
merged 1 commit into from
Dec 14, 2023

Conversation

caohe
Copy link
Contributor

@caohe caohe commented Nov 27, 2023

What type of PR is this?

/kind bug

What this PR does / why we need it:

Fix incorrect loop logic in MultiPoint to avoid a plugin being loaded multiple times.

Which issue(s) this PR fixes:

Fixes #122066

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Fixed a regression since 1.24 in the scheduling framework when overriding MultiPoint plugins (e.g. default plugins).
The incorrect loop logic might lead to a plugin being loaded multiple times, consequently preventing any Pod from being scheduled, which is unexpected.

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


@k8s-ci-robot k8s-ci-robot added size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. kind/bug Categorizes issue or PR as related to a bug. labels Nov 27, 2023
@k8s-ci-robot
Copy link
Contributor

Please note that we're already in Test Freeze for the release-1.29 branch. This means every merged PR will be automatically fast-forwarded via the periodic ci-fast-forward job to the release branch of the upcoming v1.29.0 release.

Fast forwards are scheduled to happen every 6 hours, whereas the most recent run was: Mon Nov 27 09:58:46 UTC 2023.

@k8s-ci-robot k8s-ci-robot added do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Nov 27, 2023
@k8s-ci-robot
Copy link
Contributor

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Nov 27, 2023
@k8s-ci-robot
Copy link
Contributor

Hi @caohe. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added needs-priority Indicates a PR lacks a `priority/foo` label and requires one. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Nov 27, 2023
@caohe
Copy link
Contributor Author

caohe commented Nov 27, 2023

/sig scheduling

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Nov 27, 2023
@lianghao208
Copy link
Member

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Nov 28, 2023
@lianghao208
Copy link
Member

Can you add a test case to cover this corner case?

@@ -526,11 +526,9 @@ func (f *frameworkImpl) expandMultiPointPlugins(logger klog.Logger, profile *con
// - part 3: other plugins (excluded by part 1 & 2) in regular extension point.
newPlugins := reflect.New(reflect.TypeOf(e.slicePtr).Elem()).Elem()
// part 1
for _, name := range enabledSet.list {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @Huang-Wei
Any purpose of iterating through enabledSet instead of overrodePlugins in #108613?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the purpose of iterating through enabledSet is to make sure the order of enabledSet.delete(name) is right, or else it will trigger slice bounds out of range panic enabledSet.delete()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Traversing enabledSet instead of overridePlugins directly is to allow the plugins in overridePlugins to maintain their original order in enabledSet when they were added to the newPlugins.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your information. I think it would be better to iterate over a copy of enabledSet than iterate over overridePlugins. After testing, this method can also solve the problem of plugins being loaded multiple times.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@caohe we still need a unit test to cover this, can you add a test case in pkg/scheduler/framework/runtime/framework_test.go#TestNewFrameworkMultiPointExpansion? After addint a test case, we are able to make sure this bug can be reproduced and fixed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'm doing this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any purpose of iterating through enabledSet instead of overrodePlugins in #108613?

It's for reserving the order - as the comments explained.

@AxeZhan
Copy link
Member

AxeZhan commented Nov 28, 2023

Can we add a UT for this fix? Reduce the occurrence of similar incidents in the future.

@caohe
Copy link
Contributor Author

caohe commented Nov 29, 2023

/test pull-kubernetes-e2e-kind-ipv6

@k8s-ci-robot
Copy link
Contributor

@caohe: The /retest command does not accept any targets.
The following commands are available to trigger required jobs:

  • /test pull-cadvisor-e2e-kubernetes
  • /test pull-kubernetes-conformance-kind-ga-only-parallel
  • /test pull-kubernetes-coverage-unit
  • /test pull-kubernetes-dependencies
  • /test pull-kubernetes-dependencies-go-canary
  • /test pull-kubernetes-e2e-gce
  • /test pull-kubernetes-e2e-gce-100-performance
  • /test pull-kubernetes-e2e-gce-big-performance
  • /test pull-kubernetes-e2e-gce-cos
  • /test pull-kubernetes-e2e-gce-cos-canary
  • /test pull-kubernetes-e2e-gce-cos-no-stage
  • /test pull-kubernetes-e2e-gce-network-proxy-http-connect
  • /test pull-kubernetes-e2e-gce-scale-performance-manual
  • /test pull-kubernetes-e2e-kind
  • /test pull-kubernetes-e2e-kind-ipv6
  • /test pull-kubernetes-integration
  • /test pull-kubernetes-integration-go-canary
  • /test pull-kubernetes-kubemark-e2e-gce-scale
  • /test pull-kubernetes-node-e2e-containerd
  • /test pull-kubernetes-typecheck
  • /test pull-kubernetes-unit
  • /test pull-kubernetes-unit-go-canary
  • /test pull-kubernetes-update
  • /test pull-kubernetes-verify
  • /test pull-kubernetes-verify-go-canary

The following commands are available to trigger optional jobs:

  • /test check-dependency-stats
  • /test pull-ci-kubernetes-unit-windows
  • /test pull-crio-cgroupv1-node-e2e-eviction
  • /test pull-crio-cgroupv1-node-e2e-features
  • /test pull-crio-cgroupv1-node-e2e-hugepages
  • /test pull-crio-cgroupv1-node-e2e-resource-managers
  • /test pull-crio-cgroupv2-imagefs-e2e-diskpressure
  • /test pull-e2e-gce-cloud-provider-disabled
  • /test pull-kubernetes-conformance-image-test
  • /test pull-kubernetes-conformance-kind-ga-only
  • /test pull-kubernetes-conformance-kind-ipv6-parallel
  • /test pull-kubernetes-cos-cgroupv1-containerd-node-e2e
  • /test pull-kubernetes-cos-cgroupv1-containerd-node-e2e-features
  • /test pull-kubernetes-cos-cgroupv2-containerd-node-e2e
  • /test pull-kubernetes-cos-cgroupv2-containerd-node-e2e-eviction
  • /test pull-kubernetes-cos-cgroupv2-containerd-node-e2e-features
  • /test pull-kubernetes-cos-cgroupv2-containerd-node-e2e-serial
  • /test pull-kubernetes-crio-node-memoryqos-cgrpv2
  • /test pull-kubernetes-cross
  • /test pull-kubernetes-e2e-autoscaling-hpa-cm
  • /test pull-kubernetes-e2e-autoscaling-hpa-cpu
  • /test pull-kubernetes-e2e-capz-azure-disk
  • /test pull-kubernetes-e2e-capz-azure-disk-vmss
  • /test pull-kubernetes-e2e-capz-azure-file
  • /test pull-kubernetes-e2e-capz-azure-file-vmss
  • /test pull-kubernetes-e2e-capz-conformance
  • /test pull-kubernetes-e2e-capz-windows-alpha-feature-vpa
  • /test pull-kubernetes-e2e-capz-windows-alpha-features
  • /test pull-kubernetes-e2e-capz-windows-master
  • /test pull-kubernetes-e2e-capz-windows-serial-slow-hpa
  • /test pull-kubernetes-e2e-containerd-gce
  • /test pull-kubernetes-e2e-ec2
  • /test pull-kubernetes-e2e-ec2-arm64
  • /test pull-kubernetes-e2e-ec2-conformance
  • /test pull-kubernetes-e2e-ec2-conformance-arm64
  • /test pull-kubernetes-e2e-gce-canary
  • /test pull-kubernetes-e2e-gce-correctness
  • /test pull-kubernetes-e2e-gce-cos-alpha-features
  • /test pull-kubernetes-e2e-gce-csi-serial
  • /test pull-kubernetes-e2e-gce-device-plugin-gpu
  • /test pull-kubernetes-e2e-gce-disruptive-canary
  • /test pull-kubernetes-e2e-gce-kubelet-credential-provider
  • /test pull-kubernetes-e2e-gce-network-proxy-grpc
  • /test pull-kubernetes-e2e-gce-serial
  • /test pull-kubernetes-e2e-gce-serial-canary
  • /test pull-kubernetes-e2e-gce-storage-disruptive
  • /test pull-kubernetes-e2e-gce-storage-slow
  • /test pull-kubernetes-e2e-gce-storage-snapshot
  • /test pull-kubernetes-e2e-gci-gce-autoscaling
  • /test pull-kubernetes-e2e-gci-gce-ingress
  • /test pull-kubernetes-e2e-gci-gce-ipvs
  • /test pull-kubernetes-e2e-inplace-pod-resize-containerd-main-v2
  • /test pull-kubernetes-e2e-kind-alpha-beta-features
  • /test pull-kubernetes-e2e-kind-alpha-features
  • /test pull-kubernetes-e2e-kind-beta-features
  • /test pull-kubernetes-e2e-kind-canary
  • /test pull-kubernetes-e2e-kind-dual-canary
  • /test pull-kubernetes-e2e-kind-ipv6-canary
  • /test pull-kubernetes-e2e-kind-ipvs-dual-canary
  • /test pull-kubernetes-e2e-kind-kms
  • /test pull-kubernetes-e2e-kind-multizone
  • /test pull-kubernetes-e2e-storage-kind-disruptive
  • /test pull-kubernetes-e2e-ubuntu-gce-network-policies
  • /test pull-kubernetes-integration-eks
  • /test pull-kubernetes-kind-dra
  • /test pull-kubernetes-kind-json-logging
  • /test pull-kubernetes-kind-text-logging
  • /test pull-kubernetes-kubemark-e2e-gce-big
  • /test pull-kubernetes-linter-hints
  • /test pull-kubernetes-local-e2e
  • /test pull-kubernetes-node-arm64-e2e-containerd-ec2
  • /test pull-kubernetes-node-arm64-e2e-containerd-serial-ec2
  • /test pull-kubernetes-node-arm64-ubuntu-serial-gce
  • /test pull-kubernetes-node-crio-cgrpv1-evented-pleg-e2e
  • /test pull-kubernetes-node-crio-cgrpv2-e2e
  • /test pull-kubernetes-node-crio-cgrpv2-e2e-kubetest2
  • /test pull-kubernetes-node-crio-cgrpv2-imagefs-e2e
  • /test pull-kubernetes-node-crio-e2e
  • /test pull-kubernetes-node-crio-e2e-kubetest2
  • /test pull-kubernetes-node-e2e-containerd-1-7-dra
  • /test pull-kubernetes-node-e2e-containerd-alpha-features
  • /test pull-kubernetes-node-e2e-containerd-ec2
  • /test pull-kubernetes-node-e2e-containerd-features
  • /test pull-kubernetes-node-e2e-containerd-features-kubetest2
  • /test pull-kubernetes-node-e2e-containerd-kubetest2
  • /test pull-kubernetes-node-e2e-containerd-serial-ec2
  • /test pull-kubernetes-node-e2e-containerd-sidecar-containers
  • /test pull-kubernetes-node-e2e-containerd-standalone-mode
  • /test pull-kubernetes-node-e2e-containerd-standalone-mode-all-alpha
  • /test pull-kubernetes-node-e2e-crio-dra
  • /test pull-kubernetes-node-kubelet-credential-provider
  • /test pull-kubernetes-node-kubelet-serial-containerd
  • /test pull-kubernetes-node-kubelet-serial-containerd-alpha-features
  • /test pull-kubernetes-node-kubelet-serial-containerd-kubetest2
  • /test pull-kubernetes-node-kubelet-serial-containerd-sidecar-containers
  • /test pull-kubernetes-node-kubelet-serial-cpu-manager
  • /test pull-kubernetes-node-kubelet-serial-cpu-manager-kubetest2
  • /test pull-kubernetes-node-kubelet-serial-crio-cgroupv1
  • /test pull-kubernetes-node-kubelet-serial-crio-cgroupv2
  • /test pull-kubernetes-node-kubelet-serial-hugepages
  • /test pull-kubernetes-node-kubelet-serial-memory-manager
  • /test pull-kubernetes-node-kubelet-serial-pod-disruption-conditions
  • /test pull-kubernetes-node-kubelet-serial-topology-manager
  • /test pull-kubernetes-node-kubelet-serial-topology-manager-kubetest2
  • /test pull-kubernetes-node-swap-fedora
  • /test pull-kubernetes-node-swap-fedora-serial
  • /test pull-kubernetes-node-swap-ubuntu-serial
  • /test pull-kubernetes-unit-experimental
  • /test pull-kubernetes-verify-lint
  • /test pull-publishing-bot-validate

Use /test all to run the following jobs that were automatically triggered:

  • pull-kubernetes-conformance-kind-ga-only-parallel
  • pull-kubernetes-dependencies
  • pull-kubernetes-e2e-ec2
  • pull-kubernetes-e2e-ec2-conformance
  • pull-kubernetes-e2e-gce
  • pull-kubernetes-e2e-gce-canary
  • pull-kubernetes-e2e-kind
  • pull-kubernetes-e2e-kind-ipv6
  • pull-kubernetes-integration
  • pull-kubernetes-linter-hints
  • pull-kubernetes-node-e2e-containerd
  • pull-kubernetes-typecheck
  • pull-kubernetes-unit
  • pull-kubernetes-verify
  • pull-kubernetes-verify-lint

In response to this:

/retest pull-kubernetes-e2e-kind-ipv6

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@caohe
Copy link
Contributor Author

caohe commented Nov 29, 2023

/test pull-kubernetes-e2e-kind-ipv6

@k8s-ci-robot k8s-ci-robot removed the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label Nov 29, 2023
@k8s-ci-robot k8s-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Nov 29, 2023
@caohe
Copy link
Contributor Author

caohe commented Nov 29, 2023

@lianghao208 @AxeZhan Hello, I have added a test case named Override_MultiPoint_plugins_weights_and_avoid_a_plugin_being_loaded_multiple_times.

Before this fix, it will fail:

=== RUN   TestNewFrameworkMultiPointExpansion/Override_MultiPoint_plugins_weights_and_avoid_a_plugin_being_loaded_multiple_times
    framework.go:509: I1129 17:24:29.029130] MultiPoint plugin is explicitly re-configured; overriding plugin="test-plugin"
    framework.go:509: I1129 17:24:29.029173] MultiPoint plugin is explicitly re-configured; overriding plugin="score-plugin-1"
    framework_test.go:899: Unexpected eventToPlugin map (-want,+got):  &config.Plugins{
          	... // 4 identical fields
          	PostFilter: {Enabled: {{Name: "test-plugin"}}},
          	PreScore:   {Enabled: {{Name: "test-plugin"}, {Name: "score-plugin-1"}}},
          	Score: config.PluginSet{
          		Enabled: []config.Plugin{
          			{Name: "score-plugin-1", Weight: 5},
          			{Name: "test-plugin", Weight: 3},
        + 			{Name: "test-plugin", Weight: 3},
          			{Name: "score-plugin-2", Weight: 5},
          		},
          		Disabled: nil,
          	},
          	Reserve: {Enabled: {{Name: "test-plugin"}}},
          	Permit:  {Enabled: {{Name: "test-plugin"}}},
          	... // 4 identical fields
          }
--- FAIL: TestNewFrameworkMultiPointExpansion (0.00s)
    --- PASS: TestNewFrameworkMultiPointExpansion/plugin_expansion (0.00s)
    --- PASS: TestNewFrameworkMultiPointExpansion/disable_MultiPoint_plugin_at_some_extension_points (0.00s)
    --- PASS: TestNewFrameworkMultiPointExpansion/Multiple_MultiPoint_plugins (0.00s)
    --- PASS: TestNewFrameworkMultiPointExpansion/disable_MultiPoint_extension (0.00s)
    --- PASS: TestNewFrameworkMultiPointExpansion/Reorder_MultiPoint_plugins_(specified_extension_takes_precedence) (0.00s)
    --- PASS: TestNewFrameworkMultiPointExpansion/Reorder_MultiPoint_plugins_(specified_extension_only_takes_precedence_when_it_exists_in_MultiPoint) (0.00s)
    --- PASS: TestNewFrameworkMultiPointExpansion/Override_MultiPoint_plugins_weights (0.00s)
    --- PASS: TestNewFrameworkMultiPointExpansion/disable_and_enable_MultiPoint_plugins_with_'*' (0.00s)
    --- PASS: TestNewFrameworkMultiPointExpansion/disable_and_enable_MultiPoint_plugin_by_name (0.00s)
    --- PASS: TestNewFrameworkMultiPointExpansion/Expect_'already_registered'_error (0.00s)
    --- FAIL: TestNewFrameworkMultiPointExpansion/Override_MultiPoint_plugins_weights_and_avoid_a_plugin_being_loaded_multiple_times (0.00s)

Also, I have printed the got and want results:

tc.wantPlugins.Score: config.PluginSet{Enabled:[]config.Plugin{config.Plugin{Name:"score-plugin-1", Weight:5}, config.Plugin{Name:"test-plugin", Weight:3}, config.Plugin{Name:"score-plugin-2", Weight:5}}, Disabled:[]config.Plugin(nil)}
fw.ListPlugins().Score: config.PluginSet{Enabled:[]config.Plugin{config.Plugin{Name:"score-plugin-1", Weight:5}, config.Plugin{Name:"test-plugin", Weight:3}, config.Plugin{Name:"test-plugin", Weight:3}, config.Plugin{Name:"score-plugin-2", Weight:5}}, Disabled:[]config.Plugin(nil)}

We can see from the result, test-plugin was loaded 2 times.

After this fix, it will pass:

--- PASS: TestNewFrameworkMultiPointExpansion (0.01s)
    --- PASS: TestNewFrameworkMultiPointExpansion/plugin_expansion (0.00s)
    --- PASS: TestNewFrameworkMultiPointExpansion/disable_MultiPoint_plugin_at_some_extension_points (0.00s)
    --- PASS: TestNewFrameworkMultiPointExpansion/Multiple_MultiPoint_plugins (0.00s)
    --- PASS: TestNewFrameworkMultiPointExpansion/disable_MultiPoint_extension (0.00s)
    --- PASS: TestNewFrameworkMultiPointExpansion/Reorder_MultiPoint_plugins_(specified_extension_takes_precedence) (0.00s)
    --- PASS: TestNewFrameworkMultiPointExpansion/Reorder_MultiPoint_plugins_(specified_extension_only_takes_precedence_when_it_exists_in_MultiPoint) (0.00s)
    --- PASS: TestNewFrameworkMultiPointExpansion/Override_MultiPoint_plugins_weights (0.00s)
    --- PASS: TestNewFrameworkMultiPointExpansion/disable_and_enable_MultiPoint_plugins_with_'*' (0.00s)
    --- PASS: TestNewFrameworkMultiPointExpansion/disable_and_enable_MultiPoint_plugin_by_name (0.00s)
    --- PASS: TestNewFrameworkMultiPointExpansion/Expect_'already_registered'_error (0.00s)
    --- PASS: TestNewFrameworkMultiPointExpansion/Override_MultiPoint_plugins_weights_and_avoid_a_plugin_being_loaded_multiple_times (0.00s)

@caohe
Copy link
Contributor Author

caohe commented Nov 29, 2023

/test pull-kubernetes-e2e-kind-ipv6
/test pull-kubernetes-unit
/test pull-kubernetes-linter-hints

…gin being loaded multiple times

Signed-off-by: caohe <caohe9603@gmail.com>
Copy link
Member

@Huang-Wei Huang-Wei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @caohe a lot for the catch and fix!

@@ -526,11 +526,9 @@ func (f *frameworkImpl) expandMultiPointPlugins(logger klog.Logger, profile *con
// - part 3: other plugins (excluded by part 1 & 2) in regular extension point.
newPlugins := reflect.New(reflect.TypeOf(e.slicePtr).Elem()).Elem()
// part 1
for _, name := range enabledSet.list {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any purpose of iterating through enabledSet instead of overrodePlugins in #108613?

It's for reserving the order - as the comments explained.

@@ -526,7 +527,7 @@ func (f *frameworkImpl) expandMultiPointPlugins(logger klog.Logger, profile *con
// - part 3: other plugins (excluded by part 1 & 2) in regular extension point.
newPlugins := reflect.New(reflect.TypeOf(e.slicePtr).Elem()).Elem()
// part 1
for _, name := range enabledSet.list {
for _, name := range slice.CopyStrings(enabledSet.list) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the root cause is that during orderedset.delete(), the length of the list get changed, so some item missed its chance to be iterated in the outer for loop.

The fix can be snapshotting this list in the outer for loop - as you're proposing here. Could you help check if slice.CopyStrings is available in all supported versions 1.25? ~ 1.28? as we want to cherrypick this fix.

Or, another fix is to make the for loop logic accurate w/o copying/snapshotting the list:

diff --git a/pkg/scheduler/framework/runtime/framework.go b/pkg/scheduler/framework/runtime/framework.go
index 8caa8ec1cb4..f238bcbdfb7 100644
--- a/pkg/scheduler/framework/runtime/framework.go
+++ b/pkg/scheduler/framework/runtime/framework.go
@@ -450,12 +450,10 @@ func (os *orderedSet) has(s string) bool {
 	return found
 }
 
-func (os *orderedSet) delete(s string) {
-	if i, found := os.set[s]; found {
-		delete(os.set, s)
-		os.list = append(os.list[:i-os.deletionCnt], os.list[i+1-os.deletionCnt:]...)
-		os.deletionCnt++
-	}
+func (os *orderedSet) delete(i int) {
+	delete(os.set, os.list[i])
+	os.list = append(os.list[:i], os.list[i+1:]...)
+	os.deletionCnt++
 }
 
 func (f *frameworkImpl) expandMultiPointPlugins(logger klog.Logger, profile *config.KubeSchedulerProfile, pluginsMap map[string]framework.Plugin) error {
@@ -526,10 +524,12 @@ func (f *frameworkImpl) expandMultiPointPlugins(logger klog.Logger, profile *con
 		// - part 3: other plugins (excluded by part 1 & 2) in regular extension point.
 		newPlugins := reflect.New(reflect.TypeOf(e.slicePtr).Elem()).Elem()
 		// part 1
-		for _, name := range enabledSet.list {
+		for i := range enabledSet.list {
+			j := i - enabledSet.deletionCnt
+			name := enabledSet.list[j]
 			if overridePlugins.has(name) {
 				newPlugins = reflect.Append(newPlugins, reflect.ValueOf(pluginsMap[name]))
-				enabledSet.delete(name)
+				enabledSet.delete(j)
 			}
 		}
 		// part 2

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the detailed explanation! Yes, git blame shows that the code for slice.CopyStrings has not been modified in almost 6 years. I have tested this fix on our 1.24 cluster and it works fine.
image

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for confirming. Could you help cherrypick it to all supported lower k8s versions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I will cherrypick it after this PR is merged.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, I have cherry-picked it to 1.26-1.29. Could you PTAL?
#122366
#122368
#122370
#122371

@Huang-Wei
Copy link
Member

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 30, 2023
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 34a59d936255d15bedf820c47958838c0d328cb8

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: caohe, Huang-Wei

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 30, 2023
@Huang-Wei
Copy link
Member

We may wait until v1.29 release is out. And then cherrypick.

@kerthcet
Copy link
Member

Please also fill the bug info in the cherry-picked PRs. @caohe

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed release-note-none Denotes a PR that doesn't merit a release note. labels Dec 22, 2023
@caohe
Copy link
Contributor Author

caohe commented Dec 22, 2023

Please also fill the bug info in the cherry-picked PRs. @caohe

Thanks, I have added release notes to this PR and the cherry-picked PRs. cc @kerthcet

k8s-ci-robot added a commit that referenced this pull request Jan 10, 2024
…8-upstream-release-1.26

Automated cherry pick of #122068: fix(scheduler): fix incorrect loop logic in MultiPoint to avoid a plugin being loaded multiple times
k8s-ci-robot added a commit that referenced this pull request Jan 10, 2024
…8-upstream-release-1.28

Automated cherry pick of #122068: fix(scheduler): fix incorrect loop logic in MultiPoint to avoid a plugin being loaded multiple times
k8s-ci-robot added a commit that referenced this pull request Jan 10, 2024
…8-upstream-release-1.27

Automated cherry pick of #122068: fix(scheduler): fix incorrect loop logic in MultiPoint to avoid a plugin being loaded multiple times
k8s-ci-robot added a commit that referenced this pull request Jan 10, 2024
…8-upstream-release-1.29

Automated cherry pick of #122068: fix(scheduler): fix incorrect loop logic in MultiPoint to avoid a plugin being loaded multiple times
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
7 participants