Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for resource quota on extended resources #57302

Merged
merged 2 commits into from
Feb 20, 2018
Merged

Support for resource quota on extended resources #57302

merged 2 commits into from
Feb 20, 2018

Conversation

lichuqiang
Copy link
Contributor

@lichuqiang lichuqiang commented Dec 18, 2017

Which issue(s) this PR fixes :
Fixes #46639 #57300 for resource quota support

Special notes for your reviewer:
One thing to be determined is if it necessary to Explicitly prohibit defining limits for extended resources in quota, like we did for hugepages, as the resource is not allowed to overcommit.

Release note:

Support for resource quota on extended resources

/cc @jiayingz @vishh @derekwaynecarr

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Dec 18, 2017
@lichuqiang
Copy link
Contributor Author

/area hw-accelerators

Copy link
Contributor

@tengqm tengqm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, this looks okay to me.

@@ -244,13 +245,13 @@ func podComputeUsageHelper(requests api.ResourceList, limits api.ResourceList) a
result[api.ResourceLimitsEphemeralStorage] = limit
}
for resource, request := range requests {
if quota.ContainsPrefix(requestedResourcePrefixes, resource) {
if quota.ContainsPrefix(requestedResourcePrefixes, resource) || helper.IsExtendedResourceName(resource) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This revision is kind of hijacking the hugepages logic for extended resources, although the call to maskResourceWithPrefix below is unnecessary. So I'd suggest we add two separate loops for extended resources with comments.
The current logic is unnecessarily coupling extended resources to hugepages. This is not good for future maintenance.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM, done.

@lichuqiang
Copy link
Contributor Author

lichuqiang commented Dec 18, 2017

@tengqm Not quite sure about your meaning by "it is NOT OKAY to have devices reused across all regular containers by default", I think we never plan to support that anywhere. Also I fail to see relationship between resourceQuota support and the restriction.
In my opinion, we already have a basic support for device reuse after #56818 in, and left #56943 for further discussion, which should not block us on resourceQuota support :)

@vikaschoudhary16
Copy link
Contributor

@lichuqiang As a practice and for the sake of convenience of others, may i request you to please update the consolidated RMWG PR/issues excel sheet with this PR, if have not already.

@vikaschoudhary16
Copy link
Contributor

/sig node

@k8s-ci-robot k8s-ci-robot added the sig/node Categorizes an issue or PR as relevant to SIG Node. label Dec 18, 2017
@vikaschoudhary16
Copy link
Contributor

One thing to be determined is if it necessary to Explicitly prohibit defining limits for extended resources in quota, like we did for hugepages, as the resource is not allowed to overcommit.

In the existing code even today, limits cannot be unequal to requests for the ER. If the containe spec mentions so, will fail at validation.

@tengqm
Copy link
Contributor

tengqm commented Dec 18, 2017

@vikaschoudhary16 Links have been added: https://docs.google.com/spreadsheets/d/1YBxIy23SY1BkVrGReFRr4e2OsmsLtfV3ETXPNviqDaU/edit#gid=0&range=E5

@lichuqiang
Copy link
Contributor Author

lichuqiang commented Dec 18, 2017

The device reuse logic inroduced by #56818 will have the Pod requesting only 1 resource. It means a2, r1, r2 will all share the single resource allocated to a1.
However, when computing quota, the current logic is max(sum(r1, r2), a1, a2). The result will be 2.

Nope, maybe you should take another look at #56818, I think we didn't introduce the mechanism to reuse resources between regular containers :)

@lichuqiang
Copy link
Contributor Author

lichuqiang commented Dec 18, 2017

In the existing code even today, limits cannot be unequal to requests for the ER. If the containe spec mentions so, will fail at validation.

Yep, thus, as @derekwaynecarr suggested: "we can only worry about quota on requests for the moment since the resource is not burstable.". So I wonder if we need to prohibit defining limits for extended resources in the validation func IsStandardQuotaResourceName and remove the logic for limits in quota evaluator, which has no real impact on the function though.

@tengqm
Copy link
Contributor

tengqm commented Dec 18, 2017

@lichuqiang Okay, after checking the code another time, I realized that I was misunderstanding the code. The said situation is not happening. I'm deleting my comments in case it confuses other reviewers.

@pineking
Copy link

pineking commented Dec 18, 2017

@lichuqiang how to use this for GPU quota, are there some docs? do we need to enable deviceplugin feature gate?

@lichuqiang
Copy link
Contributor Author

lichuqiang commented Dec 18, 2017

@pineking As the resource name of GPU(alpha.kubernetes.io/nvidia-gpu) is in format of extended resource, you don't need extra operation for resource quota.
But to enable GPU, you could either manage it through device plugin or the old way. Both of the two require you to enable certain feature gate
By the way, seems feature gate for device plugin has been removed and the feature is enable by default in v1.9.

@pineking
Copy link

@lichuqiang thanks, got it, I use the “old way” to enable GPU.

@vikaschoudhary16
Copy link
Contributor

code logic wise LGTM.

@tengqm
Copy link
Contributor

tengqm commented Dec 19, 2017

Oh, sorry. Just realized that we may need to hold this before claiming we support quota for extended resources. I think it is a bug introduced in #56818. Take the following Pod spec as an example:

spec:
  initContainers:
    - name: A1
      resources: 
        requests: {nvidia.com/gpu: 2}
    - name: A2
      resources: 
        requests: {nvidia.com/gpu: 2}

  containers:
    - name: C
      resources:
        requests: {nvidia.com/gpu: 2}

The current device reuse logic will first recognize the 4 GPU requests from init containers, then reuse 2 of them to the regular container C. So the total GPU requests from the pod is 4.

However, the quota check logic today ( https://github.com/kubernetes/kubernetes/blob/master/pkg/quota/evaluator/core/pods.go#L322-L332 ) computes the resource requests differently. It will do
max(max(init-containers), sum(regular-containers)). That means quota checking will see the Pod requesting 2 GPUs in the example above because A2 is run only after A1 has completed.

This problem is being fixed (as a side-effect) in #53698.

@lichuqiang
Copy link
Contributor Author

lichuqiang commented Dec 19, 2017

The current device reuse logic will first recognize the 4 GPU requests from init containers, then reuse 2 of them to the regular container C. So the total GPU requests from the pod is 4.

@tengqm Oh, I think you should take another look at #56818, device reuse between init containers is supported, we would only recognize 2 GPU in your case.

@vikaschoudhary16
Copy link
Contributor

@tengqm Agree with @lichuqiang, code walk through suggests current device reuse logic will count 2 GPUs only. For second init container, first devices will be picked from already allocated devices.

@tengqm
Copy link
Contributor

tengqm commented Dec 19, 2017

Ah, yes. Walked through the code again. I didn't quite get the tricks to build device id unions. Seems not a problem then.

InitContainers: []api.Container{{
Resources: api.ResourceRequirements{
Requests: api.ResourceList{api.ResourceName("example.com/dongle"): resource.MustParse("3")},
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should Limits also be mentioned?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to add Limits for consistency, even though it may not affect this test.

@wgliang
Copy link
Contributor

wgliang commented Feb 7, 2018

@lichuqiang Perhaps you merge two commits will be better.

@@ -75,6 +75,10 @@ func validateContainerResourceName(value string, fldPath *field.Path) field.Erro
if !helper.IsStandardContainerResourceName(value) {
return append(allErrs, field.Invalid(fldPath, value, "must be a standard resource for containers"))
}
} else if !v1helper.IsDefaultNamespaceResource(v1.ResourceName(value)) {
if !v1helper.IsExtendedResourceName(v1.ResourceName(value)) {
return append(allErrs, field.Invalid(fldPath, value, "doesn't follow extended resource name standard"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/name/naming/g

// 1. the resource name is not in the default namespace;
// 2. resource name does not have "requests." prefix,
// to avoid confusion with the convention in quota
// 3. it satisfies the rules in IsQualifiedName() after converted into quota resource name
func IsExtendedResourceName(name core.ResourceName) bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there any unit tests for this method? If so can you add test cases for each of the scenario you mention in the comment, including maximum length validation?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a follow-on w/ tests would be good for this.

@@ -4159,6 +4159,8 @@ const (
// HugePages request, in bytes. (500Gi = 500GiB = 500 * 1024 * 1024 * 1024)
// As burst is not supported for HugePages, we would only quota its request, and ignore the limit.
ResourceRequestsHugePagesPrefix = "requests.hugepages-"
// Default resource requests prefix
DefaultResourceRequestsPrefix = "requests."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@derekwaynecarr Do we still need explicit types for first class resources or can we apply the logic this PR employs for first class resources too?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i suspect we could consolidate logic now to say any compute resource (cpu, memory, etc.) could support requests.* or limits.* syntax is overcommittable.

@vishh
Copy link
Contributor

vishh commented Feb 15, 2018

/approve

Copy link
Member

@derekwaynecarr derekwaynecarr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for this useful feature.

// 1. the resource name is not in the default namespace;
// 2. resource name does not have "requests." prefix,
// to avoid confusion with the convention in quota
// 3. it satisfies the rules in IsQualifiedName() after converted into quota resource name
func IsExtendedResourceName(name core.ResourceName) bool {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a follow-on w/ tests would be good for this.

@@ -4159,6 +4159,8 @@ const (
// HugePages request, in bytes. (500Gi = 500GiB = 500 * 1024 * 1024 * 1024)
// As burst is not supported for HugePages, we would only quota its request, and ignore the limit.
ResourceRequestsHugePagesPrefix = "requests.hugepages-"
// Default resource requests prefix
DefaultResourceRequestsPrefix = "requests."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i suspect we could consolidate logic now to say any compute resource (cpu, memory, etc.) could support requests.* or limits.* syntax is overcommittable.

@derekwaynecarr
Copy link
Member

/approve

@jiayingz
Copy link
Contributor

/retest pull-kubernetes-verify

@jiayingz
Copy link
Contributor

/assign @thockin @dchen1107 for approval

@k8s-ci-robot
Copy link
Contributor

@jiayingz: GitHub didn't allow me to assign the following users: for, approval.

Note that only kubernetes members and repo collaborators can be assigned.

In response to this:

/assign @thockin @dchen1107 for approval

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@jiayingz
Copy link
Contributor

/retest pull-kubernetes-verify

@dchen1107
Copy link
Member

/approve

thanks for the feature!

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dchen1107, derekwaynecarr, jiayingz, lichuqiang, vishh

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 20, 2018
@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel comment for consistent failures.

@k8s-github-robot
Copy link

/test all

Tests are more than 96 hours old. Re-running tests.

@k8s-github-robot
Copy link

/test all [submit-queue is verifying that this PR is safe to merge]

@k8s-github-robot
Copy link

Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions here.

@k8s-github-robot k8s-github-robot merged commit 228c991 into kubernetes:master Feb 20, 2018
@pineking
Copy link

@lichuqiang are there some docs to tell how to use this feature for extended resources, e.g. "nvidia.com/gpu"

@lichuqiang
Copy link
Contributor Author

lichuqiang commented Feb 23, 2018

@pineking Not yet, I'll post a PR to update the quota doc soon.
Basically you could make use of the ER quota in the way you do for CPU/memory.
Note that only items with "requests." prefix in quota is allowed for ER.

Pod example:

apiVersion: v1
kind: Pod
metadata:
  name: test-pod
  labels:
    name: test-pod-applied
spec:
  containers:
  - name: kubernetes-pause
    image: gcr.io/google-containers/pause:2.0
    resources:
      requests:
        cpu: 300m
        memory: 1300Mi
        nvidia.com/gpu: "4"
      limits:
        cpu: 300m
        memory: 1300Mi
        nvidia.com/gpu: "4"

quota example:

apiVersion: v1
kind: ResourceQuota
metadata:
  name: quota1
spec:
  hard:
    cpu: 300m
    memory: 3900Mi
    requests.nvidia.com/gpu: 4

@pineking
Copy link

@lichuqiang I have tested this feature. It works! Thanks.

@rohitagarwal003
Copy link
Member

@lichuqiang Can you update the docs with how to set this?

@lichuqiang
Copy link
Contributor Author

docs update PR posted: kubernetes/website#7936

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/hw-accelerators cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/node Categorizes an issue or PR as relevant to SIG Node. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Resource quota support for GPU resource