Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scheduler's default plugin returns an invalid score that is not in the range of [0, 100], resulting in no Pods being scheduled #122066

Closed
caohe opened this issue Nov 27, 2023 · 5 comments · Fixed by #122068
Labels
kind/bug Categorizes issue or PR as related to a bug. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling.

Comments

@caohe
Copy link
Contributor

caohe commented Nov 27, 2023

What happened?

Scheduler's default plugin returns an invalid score that is not in the range of [0, 100], resulting in no Pods being scheduled:

E1127 19:48:13.221979   66549 schedule_one.go:130] "Error selecting node for pod" err="applying score defaultWeights on Score plugins: plugin \"PodTopologySpread\" returns an invalid score 300, it should in the range of [0, 100] after normalizing" pod="default/test-multi-point-xxxxxxxx"

After investigation, I found that there was a bug in the code that built the framework when the scheduler started. When using the plugins declared in the regular extension point to override the plugins declared in the MultiPoint, the traversed variable enabledSet will be modified in the following loop:

// Reorder plugins. Here is the expected order:
// - part 1: overridePlugins. Their order stay intact as how they're specified in regular extension point.
// - part 2: multiPointEnabled - i.e., plugin defined in multipoint but not in regular extension point.
// - part 3: other plugins (excluded by part 1 & 2) in regular extension point.
newPlugins := reflect.New(reflect.TypeOf(e.slicePtr).Elem()).Elem()
// part 1
for _, name := range enabledSet.list {
if overridePlugins.has(name) {
newPlugins = reflect.Append(newPlugins, reflect.ValueOf(pluginsMap[name]))
enabledSet.delete(name)
}
}

This may cause a same scoring plugin (e.g. PodTopologySpread) to be loaded multiple times:

f.scorePlugins: []framework.ScorePlugin{(*noderesources.Fit)(0xc000922e10), (*podtopologyspread.PodTopologySpread)(0xc0000ab100), (*podtopologyspread.PodTopologySpread)(0xc0000ab100), (*tainttoleration.TaintToleration)(0xc0003f3690), (*nodeaffinity.NodeAffinity)(0xc0003997a0), (*volumebinding.VolumeBinding)(0xc0003db7a0), (*interpodaffinity.InterPodAffinity)(0xc00038f3b0), (*noderesources.BalancedAllocation)(0xc000063680), (*imagelocality.ImageLocality)(0xc000b45320)}

Further, this results in a node being processed multiple times by a plugin's (e.g. PodTopologySpread's) NormalizeScore method, resulting in an invalid score that is not in the range of [0, 100]:

// Run NormalizeScore method for each ScorePlugin in parallel.
f.Parallelizer().Until(ctx, len(plugins), func(index int) {
pl := plugins[index]
if pl.ScoreExtensions() == nil {
return
}
nodeScoreList := pluginToNodeScores[pl.Name()]
status := f.runScoreExtension(ctx, pl, state, pod, nodeScoreList)
if !status.IsSuccess() {
err := fmt.Errorf("plugin %q failed with: %w", pl.Name(), status.AsError())
errCh.SendErrorWithCancel(err, cancel)
return
}
}, metrics.Score)

What did you expect to happen?

If the scheduler's MultiPoint feature is used (either through the default plugins or through manual configuration), and the configuration (e.g. the scoring weight) of a certain plugin is overridden in a regular extension point, the plugin will only be loaded once in this extension point, and no illegal scores will be returned.

How can we reproduce it (as minimally and precisely as possible)?

An example KubeSchedulerConfiguration:

apiVersion: kubescheduler.config.k8s.io/v1beta3
...
profiles:
- schedulerName: default-scheduler
  plugins:
    score:
      enabled:
      - name: PodTopologySpread
        weight: 3
...

Anything else we need to know?

No response

Kubernetes version

K8s v1.24.15 or higher

Cloud provider

None

OS version

# On Linux:
$ cat /etc/os-release
# paste output here
$ uname -a
# paste output here

# On Windows:
C:\> wmic os get Caption, Version, BuildNumber, OSArchitecture
# paste output here

Install tools

Container runtime (CRI) and version (if applicable)

Related plugins (CNI, CSI, ...) and versions (if applicable)

@caohe caohe added the kind/bug Categorizes issue or PR as related to a bug. label Nov 27, 2023
@k8s-ci-robot k8s-ci-robot added needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Nov 27, 2023
@k8s-ci-robot
Copy link
Contributor

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@caohe
Copy link
Contributor Author

caohe commented Nov 27, 2023

/sig scheduling

@Zeel-Patel
Copy link

Can I work on this?

@Huang-Wei
Copy link
Member

Can I work on this?

this is being worked on. see #122068

@Zeel-Patel
Copy link

Zeel-Patel commented Dec 1, 2023

Can I work on this?

this is being worked on. see #122068

I see, Is there any other task that you are aware of and I can work on? Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling.
Projects
None yet
4 participants