Scheduler's default plugin returns an invalid score that is not in the range of [0, 100], resulting in no Pods being scheduled #122066
Description
What happened?
Scheduler's default plugin returns an invalid score that is not in the range of [0, 100], resulting in no Pods being scheduled:
E1127 19:48:13.221979 66549 schedule_one.go:130] "Error selecting node for pod" err="applying score defaultWeights on Score plugins: plugin \"PodTopologySpread\" returns an invalid score 300, it should in the range of [0, 100] after normalizing" pod="default/test-multi-point-xxxxxxxx"
After investigation, I found that there was a bug in the code that built the framework when the scheduler started. When using the plugins declared in the regular extension point to override the plugins declared in the MultiPoint, the traversed variable enabledSet
will be modified in the following loop:
kubernetes/pkg/scheduler/framework/runtime/framework.go
Lines 523 to 534 in ad9b60e
This may cause a same scoring plugin (e.g. PodTopologySpread
) to be loaded multiple times:
f.scorePlugins: []framework.ScorePlugin{(*noderesources.Fit)(0xc000922e10), (*podtopologyspread.PodTopologySpread)(0xc0000ab100), (*podtopologyspread.PodTopologySpread)(0xc0000ab100), (*tainttoleration.TaintToleration)(0xc0003f3690), (*nodeaffinity.NodeAffinity)(0xc0003997a0), (*volumebinding.VolumeBinding)(0xc0003db7a0), (*interpodaffinity.InterPodAffinity)(0xc00038f3b0), (*noderesources.BalancedAllocation)(0xc000063680), (*imagelocality.ImageLocality)(0xc000b45320)}
Further, this results in a node being processed multiple times by a plugin's (e.g. PodTopologySpread
's) NormalizeScore
method, resulting in an invalid score that is not in the range of [0, 100]:
kubernetes/pkg/scheduler/framework/runtime/framework.go
Lines 1095 to 1108 in ad9b60e
What did you expect to happen?
If the scheduler's MultiPoint feature is used (either through the default plugins or through manual configuration), and the configuration (e.g. the scoring weight) of a certain plugin is overridden in a regular extension point, the plugin will only be loaded once in this extension point, and no illegal scores will be returned.
How can we reproduce it (as minimally and precisely as possible)?
An example KubeSchedulerConfiguration:
apiVersion: kubescheduler.config.k8s.io/v1beta3
...
profiles:
- schedulerName: default-scheduler
plugins:
score:
enabled:
- name: PodTopologySpread
weight: 3
...
Anything else we need to know?
No response
Kubernetes version
K8s v1.24.15 or higher
Cloud provider
None
OS version
# On Linux:
$ cat /etc/os-release
# paste output here
$ uname -a
# paste output here
# On Windows:
C:\> wmic os get Caption, Version, BuildNumber, OSArchitecture
# paste output here