Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add score func for NodeResourcesFit plugin #101822

Merged
merged 1 commit into from
Jun 29, 2021

Conversation

yuzhiquan
Copy link
Member

@yuzhiquan yuzhiquan commented May 8, 2021

What type of PR is this?

/kind feature

What this PR does / why we need it:

A single scoring plugin for node resources

Which issue(s) this PR fixes:

Refs #98746

KEP kubernetes/enhancements#2458 kubernetes/enhancements#2461

Special notes for your reviewer:

Does this PR introduce a user-facing change?

A new score extension for NodeResourcesFit plugin that merges the functionality of NodeResourcesLeastAllocated,NodeResourcesMostAllocated,RequestedToCapacityRatio plugins, which are marked as deprecated as of v1beta2. In v1beta1, the three plugins can still be used in v1beta1 but not at the same time with the score extension of NodeResourcesFit

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


/sig scheduling
/triage accepted
/priority important-longterm
/assign @ahg-g

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/feature Categorizes issue or PR as related to a new feature. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels May 8, 2021
@yuzhiquan yuzhiquan force-pushed the NodeResourcesFit-score branch from be1a67a to 4c74db3 Compare May 8, 2021 02:53
@k8s-ci-robot k8s-ci-robot added the area/dependency Issues or PRs related to dependency changes label May 8, 2021
@yuzhiquan yuzhiquan force-pushed the NodeResourcesFit-score branch 3 times, most recently from c04b6f2 to f5368b2 Compare May 8, 2021 07:10
@yuzhiquan
Copy link
Member Author

yuzhiquan commented May 10, 2021

I did make generated_files, this action auto remove k8s.io/kubernetes/pkg/scheduler/apis/config/v1beta1/zz_generated.conversion.go:109: Convert_config_NodeResourcesFitArgs_To_v1beta1_NodeResourcesFitArgs function, and ci failed for this, now i was confused how to fix this? @ahg-g @Huang-Wei

Copy link
Member

@ahg-g ahg-g left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just took a quick look at the API, will take a look at the full PR later today.

As for the error, you need to update the type in staging (https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/kube-scheduler/config/v1beta1/types.go)

Note that we are working on v1beta2 (#99597), but for now you can update the v1beta1 type just so we make sure that the PR successfully passes the tests.

pkg/scheduler/apis/config/types_pluginargs.go Outdated Show resolved Hide resolved
pkg/scheduler/apis/config/types_pluginargs.go Outdated Show resolved Hide resolved
@yuzhiquan yuzhiquan force-pushed the NodeResourcesFit-score branch from f5368b2 to 2f26149 Compare May 11, 2021 03:42
@k8s-ci-robot k8s-ci-robot added the kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API label May 11, 2021
@fejta-bot
Copy link

This PR may require API review.

If so, when the changes are ready, complete the pre-review checklist and request an API review.

Status of requested reviews is tracked in the API Review project.

@yuzhiquan yuzhiquan force-pushed the NodeResourcesFit-score branch 2 times, most recently from 5abbe00 to 37c4dc3 Compare May 11, 2021 07:57
@yuzhiquan yuzhiquan force-pushed the NodeResourcesFit-score branch from 37c4dc3 to 9d72fdc Compare May 12, 2021 04:22
@yuzhiquan yuzhiquan force-pushed the NodeResourcesFit-score branch 3 times, most recently from 62c5f91 to f5f5342 Compare May 13, 2021 03:23
@yuzhiquan yuzhiquan force-pushed the NodeResourcesFit-score branch from f5f5342 to 220220c Compare May 17, 2021 02:33
@ahg-g ahg-g force-pushed the NodeResourcesFit-score branch from 0e5bd06 to db60f57 Compare June 25, 2021 12:54
@ahg-g
Copy link
Member

ahg-g commented Jun 25, 2021

@liggitt @alculquicondor we address all the comments. We updated the conflict logic to apply to the new score plugin only. Please take a look, I really think this is the safe way to go, it is backward compatible and protects users from surprises. But if you still strongly disagree with this approach then we can change it. Let us know and based on that we will update the release notes on the description.

@ahg-g
Copy link
Member

ahg-g commented Jun 28, 2021

@liggitt friendly ping.

@liggitt
Copy link
Member

liggitt commented Jun 29, 2021

sorry for the delay, was digging into working default 1.21 configs to assure myself this wasn't breaking compatibility

git checkout upstream/release-1.21
make kube-scheduler
kube-scheduler --master=http://localhost:8080 --write-config-to 1.21_default.yaml
kube-scheduler --master=http://localhost:8080 --config 1.21_default.yaml

passing configs with 1.21 scheduler:
- 1.21_default_fixed.yaml (explicitly disabled then enabled plugins to avoid warnings about duplicates)
- 1.21_default_fixed_no_pluginConfig.yaml

failing configs with 1.21 scheduler:
- 1.21_default.yaml
  - `couldn't create scheduler using provider "DefaultProvider": initializing profiles: one queue sort plugin required for profile with scheduler name "default-scheduler"`
- 1.21_default_fixed_score_nodefit.yaml (try to enable NodeResourcesFit as a score plugin)
  - `couldn't create scheduler using provider "DefaultProvider": initializing profiles: creating profile for scheduler name default-scheduler: plugin "NodeResourcesFit" does not extend ScorePlugin plugin`

---

tested with kube-scheduler built from this PR

passing configs with scheduler built from PR:
- 1.21_default_fixed.yaml (explicitly disabled then enabled plugins to avoid warnings about duplicates)
- 1.21_default_fixed_no_pluginConfig.yaml

failing configs with scheduler built from PR:
- 1.21_default.yaml
  - `profiles[0].plugins.queueSort.Enabled: Invalid value: 2: only one queue sort plugin can be enabled`
- 1.21_default_fixed_score_nodefit.yaml (try to enable NodeResourcesFit as a score plugin)
  - `profiles[0].plugins.score.enabled[8]: Invalid value: "NodeResourcesFit": was conflict with ["NodeResourcesLeastAllocated"] in version "kubescheduler.config.k8s.io/v1beta1" (KubeSchedulerConfiguration is version "kubescheduler.config.k8s.io/v1beta1")`

The 1.21 error loading a config that tried to enable NodeResourcesFit as a score plugin was what I was wanting to be sure of.

I do think the plugin conflict logic is hard to follow and makes it easier to break compatibility in the future by registering a conflict that used to be allowed.


1.21_default_fixed_no_pluginConfig.yaml.txt
1.21_default_fixed_score_nodefit.yaml.txt
1.21_default_fixed.yaml.txt
1.21_default.yaml.txt

@liggitt
Copy link
Member

liggitt commented Jun 29, 2021

/approve
for API changes

/hold for scheduler lgtm and approve if needed

@ahg-g I'd like to see these followups:

  • ensure the default config output by --write-config-to is usable without manual fixup
  • add a test to capture the working default config from current and previous releases to make sure that continues to load/validate successfully

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 29, 2021
@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 29, 2021
@ahg-g ahg-g force-pushed the NodeResourcesFit-score branch from 72599f9 to deb14b9 Compare June 29, 2021 17:17
@ahg-g
Copy link
Member

ahg-g commented Jun 29, 2021

Thanks Jordan for the detailed review and due diligence! I will create issues for the followup items.

@ahg-g
Copy link
Member

ahg-g commented Jun 29, 2021

/release-note "A new score extension for NodeResourcesFit plugin that merges the functionality of NodeResourcesLeastAllocated,NodeResourcesMostAllocated,RequestedToCapacityRatio plugins, which are marked as deprecated as of v1beta2. In v1beta1, the three plugins can still be used in v1beta1 but not at the same time with the score extension of NodeResourcesFit"

@ahg-g
Copy link
Member

ahg-g commented Jun 29, 2021

/hold

until release note is updated.

@ahg-g
Copy link
Member

ahg-g commented Jun 29, 2021

/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 29, 2021
@ahg-g
Copy link
Member

ahg-g commented Jun 29, 2021

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 29, 2021
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ahg-g, liggitt, yuzhiquan

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@yuzhiquan
Copy link
Member Author

Thanks for help and coauthor commit from @ahg-g , and detailed review comment by all of the reviewers, especially @liggitt and @alculquicondor .

@ahg-g
Copy link
Member

ahg-g commented Jun 29, 2021

Created #103309 and #103308

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-review Categorizes an issue or PR as actively needing an API review. approved Indicates a PR has been approved by an approver from all required OWNERS files. area/apiserver area/cloudprovider area/dependency Issues or PRs related to dependency changes area/e2e-test-framework Issues or PRs related to refactoring the kubernetes e2e test framework area/kubelet area/test cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/apps Categorizes an issue or PR as relevant to SIG Apps. sig/auth Categorizes an issue or PR as relevant to SIG Auth. sig/cloud-provider Categorizes an issue or PR as relevant to SIG Cloud Provider. sig/instrumentation Categorizes an issue or PR as relevant to SIG Instrumentation. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. sig/storage Categorizes an issue or PR as relevant to SIG Storage. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
Status: API review completed, 1.22
Development

Successfully merging this pull request may close these issues.

8 participants