Scheduler's default plugin returns an invalid score that is not in the range of [0, 100], resulting in no Pods being scheduled #122066

caohe · 2023-11-27T12:37:12Z

What happened?

Scheduler's default plugin returns an invalid score that is not in the range of [0, 100], resulting in no Pods being scheduled:

E1127 19:48:13.221979   66549 schedule_one.go:130] "Error selecting node for pod" err="applying score defaultWeights on Score plugins: plugin \"PodTopologySpread\" returns an invalid score 300, it should in the range of [0, 100] after normalizing" pod="default/test-multi-point-xxxxxxxx"

After investigation, I found that there was a bug in the code that built the framework when the scheduler started. When using the plugins declared in the regular extension point to override the plugins declared in the MultiPoint, the traversed variable enabledSet will be modified in the following loop:

kubernetes/pkg/scheduler/framework/runtime/framework.go

Lines 523 to 534 in ad9b60e

    
           // Reorder plugins. Here is the expected order: 
        
           // - part 1: overridePlugins. Their order stay intact as how they're specified in regular extension point. 
        
           // - part 2: multiPointEnabled - i.e., plugin defined in multipoint but not in regular extension point. 
        
           // - part 3: other plugins (excluded by part 1 & 2) in regular extension point. 
        
           newPlugins := reflect.New(reflect.TypeOf(e.slicePtr).Elem()).Elem() 
        
           // part 1 
        
           for _, name := range enabledSet.list { 
        
           	if overridePlugins.has(name) { 
        
           		newPlugins = reflect.Append(newPlugins, reflect.ValueOf(pluginsMap[name])) 
        
           		enabledSet.delete(name) 
        
           	} 
        
           }

This may cause a same scoring plugin (e.g. PodTopologySpread) to be loaded multiple times:

f.scorePlugins: []framework.ScorePlugin{(*noderesources.Fit)(0xc000922e10), (*podtopologyspread.PodTopologySpread)(0xc0000ab100), (*podtopologyspread.PodTopologySpread)(0xc0000ab100), (*tainttoleration.TaintToleration)(0xc0003f3690), (*nodeaffinity.NodeAffinity)(0xc0003997a0), (*volumebinding.VolumeBinding)(0xc0003db7a0), (*interpodaffinity.InterPodAffinity)(0xc00038f3b0), (*noderesources.BalancedAllocation)(0xc000063680), (*imagelocality.ImageLocality)(0xc000b45320)}

Further, this results in a node being processed multiple times by a plugin's (e.g. PodTopologySpread's) NormalizeScore method, resulting in an invalid score that is not in the range of [0, 100]:

kubernetes/pkg/scheduler/framework/runtime/framework.go

Lines 1095 to 1108 in ad9b60e

    
           // Run NormalizeScore method for each ScorePlugin in parallel. 
        
           f.Parallelizer().Until(ctx, len(plugins), func(index int) { 
        
           	pl := plugins[index] 
        
           	if pl.ScoreExtensions() == nil { 
        
           		return 
        
           	} 
        
           	nodeScoreList := pluginToNodeScores[pl.Name()] 
        
           	status := f.runScoreExtension(ctx, pl, state, pod, nodeScoreList) 
        
           	if !status.IsSuccess() { 
        
           		err := fmt.Errorf("plugin %q failed with: %w", pl.Name(), status.AsError()) 
        
           		errCh.SendErrorWithCancel(err, cancel) 
        
           		return 
        
           	} 
        
           }, metrics.Score)

What did you expect to happen?

If the scheduler's MultiPoint feature is used (either through the default plugins or through manual configuration), and the configuration (e.g. the scoring weight) of a certain plugin is overridden in a regular extension point, the plugin will only be loaded once in this extension point, and no illegal scores will be returned.

How can we reproduce it (as minimally and precisely as possible)?

An example KubeSchedulerConfiguration:

apiVersion: kubescheduler.config.k8s.io/v1beta3
...
profiles:
- schedulerName: default-scheduler
  plugins:
    score:
      enabled:
      - name: PodTopologySpread
        weight: 3
...

Anything else we need to know?

No response

Kubernetes version

K8s v1.24.15 or higher

Cloud provider

None

OS version

# On Linux:
$ cat /etc/os-release
# paste output here
$ uname -a
# paste output here

# On Windows:
C:\> wmic os get Caption, Version, BuildNumber, OSArchitecture
# paste output here

Install tools

Container runtime (CRI) and version (if applicable)

Related plugins (CNI, CSI, ...) and versions (if applicable)

The text was updated successfully, but these errors were encountered:

k8s-ci-robot · 2023-11-27T12:37:23Z

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

caohe · 2023-11-27T12:39:11Z

/sig scheduling

Zeel-Patel · 2023-11-30T17:25:03Z

Can I work on this?

Huang-Wei · 2023-11-30T17:26:14Z

Can I work on this?

this is being worked on. see #122068

Zeel-Patel · 2023-12-01T18:19:33Z

Can I work on this?

this is being worked on. see #122068

I see, Is there any other task that you are aware of and I can work on? Thanks.

caohe added the kind/bug Categorizes issue or PR as related to a bug. label Nov 27, 2023

k8s-ci-robot added needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Nov 27, 2023

k8s-ci-robot added sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Nov 27, 2023

caohe mentioned this issue Nov 27, 2023

fix(scheduler): fix incorrect loop logic in MultiPoint to avoid a plugin being loaded multiple times #122068

Merged

Huang-Wei mentioned this issue Nov 29, 2023

docs: update the version of scheduler config and use MultiPoint to simplify them kubernetes-sigs/scheduler-plugins#680

Merged

k8s-ci-robot closed this as completed in #122068 Dec 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scheduler's default plugin returns an invalid score that is not in the range of [0, 100], resulting in no Pods being scheduled #122066

Scheduler's default plugin returns an invalid score that is not in the range of [0, 100], resulting in no Pods being scheduled #122066

caohe commented Nov 27, 2023 •

edited

Loading

k8s-ci-robot commented Nov 27, 2023

caohe commented Nov 27, 2023

Zeel-Patel commented Nov 30, 2023

Huang-Wei commented Nov 30, 2023

Zeel-Patel commented Dec 1, 2023 •

edited

Loading

Scheduler's default plugin returns an invalid score that is not in the range of [0, 100], resulting in no Pods being scheduled #122066

Scheduler's default plugin returns an invalid score that is not in the range of [0, 100], resulting in no Pods being scheduled #122066

Comments

caohe commented Nov 27, 2023 • edited Loading

What happened?

What did you expect to happen?

How can we reproduce it (as minimally and precisely as possible)?

Anything else we need to know?

Kubernetes version

Cloud provider

OS version

Install tools

Container runtime (CRI) and version (if applicable)

Related plugins (CNI, CSI, ...) and versions (if applicable)

k8s-ci-robot commented Nov 27, 2023

caohe commented Nov 27, 2023

Zeel-Patel commented Nov 30, 2023

Huang-Wei commented Nov 30, 2023

Zeel-Patel commented Dec 1, 2023 • edited Loading

caohe commented Nov 27, 2023 •

edited

Loading

Zeel-Patel commented Dec 1, 2023 •

edited

Loading