Skip to content

Scheduler's default plugin returns an invalid score that is not in the range of [0, 100], resulting in no Pods being scheduled #122066

Closed
@caohe

Description

What happened?

Scheduler's default plugin returns an invalid score that is not in the range of [0, 100], resulting in no Pods being scheduled:

E1127 19:48:13.221979   66549 schedule_one.go:130] "Error selecting node for pod" err="applying score defaultWeights on Score plugins: plugin \"PodTopologySpread\" returns an invalid score 300, it should in the range of [0, 100] after normalizing" pod="default/test-multi-point-xxxxxxxx"

After investigation, I found that there was a bug in the code that built the framework when the scheduler started. When using the plugins declared in the regular extension point to override the plugins declared in the MultiPoint, the traversed variable enabledSet will be modified in the following loop:

// Reorder plugins. Here is the expected order:
// - part 1: overridePlugins. Their order stay intact as how they're specified in regular extension point.
// - part 2: multiPointEnabled - i.e., plugin defined in multipoint but not in regular extension point.
// - part 3: other plugins (excluded by part 1 & 2) in regular extension point.
newPlugins := reflect.New(reflect.TypeOf(e.slicePtr).Elem()).Elem()
// part 1
for _, name := range enabledSet.list {
if overridePlugins.has(name) {
newPlugins = reflect.Append(newPlugins, reflect.ValueOf(pluginsMap[name]))
enabledSet.delete(name)
}
}

This may cause a same scoring plugin (e.g. PodTopologySpread) to be loaded multiple times:

f.scorePlugins: []framework.ScorePlugin{(*noderesources.Fit)(0xc000922e10), (*podtopologyspread.PodTopologySpread)(0xc0000ab100), (*podtopologyspread.PodTopologySpread)(0xc0000ab100), (*tainttoleration.TaintToleration)(0xc0003f3690), (*nodeaffinity.NodeAffinity)(0xc0003997a0), (*volumebinding.VolumeBinding)(0xc0003db7a0), (*interpodaffinity.InterPodAffinity)(0xc00038f3b0), (*noderesources.BalancedAllocation)(0xc000063680), (*imagelocality.ImageLocality)(0xc000b45320)}

Further, this results in a node being processed multiple times by a plugin's (e.g. PodTopologySpread's) NormalizeScore method, resulting in an invalid score that is not in the range of [0, 100]:

// Run NormalizeScore method for each ScorePlugin in parallel.
f.Parallelizer().Until(ctx, len(plugins), func(index int) {
pl := plugins[index]
if pl.ScoreExtensions() == nil {
return
}
nodeScoreList := pluginToNodeScores[pl.Name()]
status := f.runScoreExtension(ctx, pl, state, pod, nodeScoreList)
if !status.IsSuccess() {
err := fmt.Errorf("plugin %q failed with: %w", pl.Name(), status.AsError())
errCh.SendErrorWithCancel(err, cancel)
return
}
}, metrics.Score)

What did you expect to happen?

If the scheduler's MultiPoint feature is used (either through the default plugins or through manual configuration), and the configuration (e.g. the scoring weight) of a certain plugin is overridden in a regular extension point, the plugin will only be loaded once in this extension point, and no illegal scores will be returned.

How can we reproduce it (as minimally and precisely as possible)?

An example KubeSchedulerConfiguration:

apiVersion: kubescheduler.config.k8s.io/v1beta3
...
profiles:
- schedulerName: default-scheduler
  plugins:
    score:
      enabled:
      - name: PodTopologySpread
        weight: 3
...

Anything else we need to know?

No response

Kubernetes version

K8s v1.24.15 or higher

Cloud provider

None

OS version

# On Linux:
$ cat /etc/os-release
# paste output here
$ uname -a
# paste output here

# On Windows:
C:\> wmic os get Caption, Version, BuildNumber, OSArchitecture
# paste output here

Install tools

Container runtime (CRI) and version (if applicable)

Related plugins (CNI, CSI, ...) and versions (if applicable)

Metadata

Assignees

No one assigned

    Labels

    kind/bugCategorizes issue or PR as related to a bug.needs-triageIndicates an issue or PR lacks a `triage/foo` label and requires one.sig/schedulingCategorizes an issue or PR as relevant to SIG Scheduling.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions