fix GPU resource validation that incorrectly allows zero limits #50218

dixudx · 2017-08-07T02:16:43Z

What this PR does / why we need it:

The validation logic for GPUs is not run if limits is not set for GPUs.
We need to check limits equals requests even if just request is set for GPUs.

Which issue this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close that issue when PR gets merged): fixes #50182

Special notes for your reviewer:
/assign @vishh
/cc @jiayingz

Release note:

fix GPU resource validation that incorrectly allows zero limits

k8s-ci-robot · 2017-08-07T02:16:45Z

@dixudx: GitHub didn't allow me to request PR reviews from the following users: jiayingz.

Note that only kubernetes members can review this PR, and authors cannot review their own PRs.

In response to this:

What this PR does / why we need it:

The validation logic for GPUs is not run if limits is not set for GPUs.
We need to check limits equals requests even if just request is set for GPUs.

Which issue this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close that issue when PR gets merged): fixes #50182

Special notes for your reviewer:
/assign @vishh
/cc @jiayingz

Release note:
fix GPU resource validation that incorrectly allows zero limits

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot · 2017-08-07T02:16:51Z

Hi @dixudx. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

dixudx · 2017-08-07T02:20:45Z

/assign @lavalamp @thockin

vishh · 2017-08-07T02:22:14Z

@dixudx I appreciate posting this PR. Your contributions are much welcome. In the future, can I request you to mention explicitly that you are working on a fix for an issue before you post a patch?
This will help avoid multiple people spending time on the same problem. Co-ordination is a but difficult in an OSS project. Your cooperation is much appreciated :)

vishh · 2017-08-07T02:23:31Z

pkg/api/v1/validation/validation.go

@@ -60,6 +60,15 @@ func ValidateResourceRequirements(requirements *v1.ResourceRequirements, fldPath
 		allErrs = append(allErrs, validateContainerResourceName(string(resourceName), fldPath)...)
 		// Validate resource quantity.
 		allErrs = append(allErrs, ValidateResourceQuantityValue(string(resourceName), quantity, fldPath)...)
+
+		// For GPUs, not only requests can't exceed limits, they also can't be lower, i.e. must be equal.


This comment is not appropriate to the check down below

@vishh Updated. PTAL. Thanks.

Also many thanks for your kind reminder.

xiangpengzhao · 2017-08-07T04:00:55Z

pkg/api/v1/validation/validation.go

@@ -60,6 +60,15 @@ func ValidateResourceRequirements(requirements *v1.ResourceRequirements, fldPath
 		allErrs = append(allErrs, validateContainerResourceName(string(resourceName), fldPath)...)
 		// Validate resource quantity.
 		allErrs = append(allErrs, ValidateResourceQuantityValue(string(resourceName), quantity, fldPath)...)
+
+		// For GPUs, limits has to be set equal to requests.
+		if resourceName == v1.ResourceNvidiaGPU {


Since limits has to be set equal to requests for GPU, would it be better to set limits defaulting to requests if limits not configured? @vishh

Unfortunately, we don't really have any precedent for request/limit defaulting per-resource. I sort of wish we did. What do you think @vishh

IIRC, for resources like CPU/memory, if their requests are not configured, they will be defaulted to the limits.

Every line of code we introduce that's per resource makes the system harder to explain.
GPUs will be kicked out to an extension sometime soon. So I don't think its worth expanding the validation logic just for GPUs.

thockin · 2017-08-07T04:26:20Z

AFAIK, the default for both CPU and RAM if limit is unspecified is "inf". https://kubernetes.io/docs/tasks/configure-pod-container/assign-cpu-ram-container/ Why can't GPU just ignore limit - it's kind of semantically silly?

…

On Sun, Aug 6, 2017 at 9:18 PM, Peter (XiangPeng) Zhao < ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In pkg/api/v1/validation/validation.go <#50218 (comment)> : > @@ -60,6 +60,15 @@ func ValidateResourceRequirements(requirements *v1.ResourceRequirements, fldPath allErrs = append(allErrs, validateContainerResourceName(string(resourceName), fldPath)...) // Validate resource quantity. allErrs = append(allErrs, ValidateResourceQuantityValue(string(resourceName), quantity, fldPath)...) + + // For GPUs, limits has to be set equal to requests. + if resourceName == v1.ResourceNvidiaGPU { IIRC, for resources like CPU/memory, if their requests are not configured, they will be defaulted to the limits. — You are receiving this because you were assigned. Reply to this email directly, view it on GitHub <#50218 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AFVgVPVF-4_5j8MVaC6V6ZJZ_GAwb_mEks5sVpAIgaJpZM4Ou7Ip> .

xiangpengzhao · 2017-08-07T04:44:42Z

AFAIK, the default for both CPU and RAM if limit is unspecified is "inf".

Correct. But what I meant is that if limit is specified but request is unspecified, request will default to limit. @vishh right?

dixudx · 2017-08-07T04:50:32Z

@xiangpengzhao

if limit is specified but request is unspecified, request will default to limit.

Exactly.

vishh · 2017-08-07T19:35:32Z

Why can't GPU just ignore limit - it's kind of semantically silly?

@thockin specifying request for GPUs is silly as well since GPUs (or any non shared resource) cannot exceed their requests.
I think the validation logic could have been limits must be specified for GPUs and request must equal limits. If only limits is specified, validation would always pass due to defaulting.

vishh · 2017-08-07T19:35:57Z

@dixudx We need unit tests for this PR.

vishh · 2017-08-07T19:38:24Z

pkg/api/v1/validation/validation.go

@@ -60,6 +60,15 @@ func ValidateResourceRequirements(requirements *v1.ResourceRequirements, fldPath
 		allErrs = append(allErrs, validateContainerResourceName(string(resourceName), fldPath)...)
 		// Validate resource quantity.
 		allErrs = append(allErrs, ValidateResourceQuantityValue(string(resourceName), quantity, fldPath)...)
+


I feel the check under limits validation can be merged into this logic.

If limits are specified and requests aren't specified, requests will equal limits. The logic here will be hit.

If limits aren't specified and request is, then this logic will be hit as well.

If neither is specified, we do not care.

dixudx · 2017-08-09T12:59:30Z

@vishh @thockin Updated. PTAL. Thanks.

cmluciano · 2017-08-09T14:57:43Z

/ok-to-test

dixudx · 2017-08-10T04:52:28Z

/retest

dixudx · 2017-08-10T05:36:48Z

@vishh @thockin Updated. PTAL. Thanks.

vishh · 2017-08-10T20:05:39Z

/lgtm
/approve

@thockin This PR is ready for your review (again)

thockin · 2017-08-14T19:58:42Z

pkg/api/v1/validation/validation.go

 	}
 	for resourceName, quantity := range requirements.Requests {
 		fldPath := reqPath.Key(string(resourceName))
 		// Validate resource name.
 		allErrs = append(allErrs, validateContainerResourceName(string(resourceName), fldPath)...)
 		// Validate resource quantity.
 		allErrs = append(allErrs, ValidateResourceQuantityValue(string(resourceName), quantity, fldPath)...)
+
+		// Check that request <= limit.
+		limitQuantity, exists := requirements.Limits[resourceName]


Why does this function exist in 2 places? @caesarxuchao your fingerprints are on the commit that copied these functions into versioned. Why? This is a nightmare for updates like this - who would expect this code to be in two places?

+1, there are also other functions which exist in two places. We DO NOT want to maintain things this way.

See one example:
https://github.com/kubernetes/kubernetes/blob/master/pkg/api/v1/service/util.go
https://github.com/kubernetes/kubernetes/blob/master/pkg/api/service/util.go

They almost have the same functions. We'd better only keep one copy.

Tim, I sent an issue to track this. #50653

As per #50653 (comment) by @liggitt , we should keep them as-is...

Agree with liggitt's comment. I'd like to explore if we can make admission controllers operate on versioned objects, because we want to make the life easier for developers of thirdparty admission webhooks.

thockin · 2017-08-17T04:58:27Z

/approve

dixudx · 2017-08-17T08:33:24Z

/test pull-kubernetes-e2e-gce-etcd3

dixudx · 2017-08-18T01:01:10Z

ping @vishh @thockin @caesarxuchao

Rebased. PTAL. Need your /lgtm again. Thanks.

vishh · 2017-08-18T01:43:46Z

/lgtm

k8s-github-robot · 2017-08-18T01:44:42Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dixudx, thockin, vishh

Associated issue: 50182

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:

~~pkg/api/OWNERS~~ [thockin]
~~pkg/quota/OWNERS~~ [thockin,vishh]

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

k8s-github-robot · 2017-08-18T02:57:40Z

Automatic merge from submit-queue

k8s-ci-robot assigned vishh Aug 7, 2017

k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Aug 7, 2017

k8s-github-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. release-note Denotes a PR that will be considered when it comes time to generate release notes. labels Aug 7, 2017

k8s-ci-robot assigned lavalamp and thockin Aug 7, 2017

vishh reviewed Aug 7, 2017

View reviewed changes

dixudx force-pushed the fix_GPU_resource_zero_limit_validation branch from 3a39bbd to e150ea5 Compare August 7, 2017 02:34

dixudx mentioned this pull request Aug 7, 2017

GPU resource validation incorrectly allows zero limits #50182

Closed

xiangpengzhao reviewed Aug 7, 2017

View reviewed changes

vishh reviewed Aug 7, 2017

View reviewed changes

dixudx force-pushed the fix_GPU_resource_zero_limit_validation branch from e150ea5 to a7deba8 Compare August 9, 2017 12:53

k8s-github-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Aug 9, 2017

k8s-ci-robot removed the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Aug 9, 2017

vishh unassigned lavalamp Aug 10, 2017

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 10, 2017

thockin reviewed Aug 14, 2017

View reviewed changes

k8s-github-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Aug 17, 2017

dixudx added 3 commits August 17, 2017 15:41

fix GPU resource validation incorrectly allows zero limits

a8742a0

auto-gen

fc0bdb5

update testcase err msg

75258b2

dixudx force-pushed the fix_GPU_resource_zero_limit_validation branch from 6a177a8 to 75258b2 Compare August 17, 2017 07:43

k8s-github-robot removed lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Aug 17, 2017

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 18, 2017

vishh added this to the v1.8 milestone Aug 18, 2017

k8s-github-robot merged commit 075d209 into kubernetes:master Aug 18, 2017

dixudx deleted the fix_GPU_resource_zero_limit_validation branch August 18, 2017 02:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix GPU resource validation that incorrectly allows zero limits #50218

fix GPU resource validation that incorrectly allows zero limits #50218

dixudx commented Aug 7, 2017

k8s-ci-robot commented Aug 7, 2017

k8s-ci-robot commented Aug 7, 2017

dixudx commented Aug 7, 2017

vishh commented Aug 7, 2017

vishh Aug 7, 2017

dixudx Aug 7, 2017

xiangpengzhao Aug 7, 2017

thockin Aug 7, 2017

xiangpengzhao Aug 7, 2017

vishh Aug 7, 2017

thockin commented Aug 7, 2017 via email

xiangpengzhao commented Aug 7, 2017

dixudx commented Aug 7, 2017

vishh commented Aug 7, 2017

vishh commented Aug 7, 2017

vishh Aug 7, 2017

dixudx commented Aug 9, 2017

cmluciano commented Aug 9, 2017

dixudx commented Aug 10, 2017

dixudx commented Aug 10, 2017

vishh commented Aug 10, 2017

thockin Aug 14, 2017

xiangpengzhao Aug 15, 2017

xiangpengzhao Aug 15, 2017

xiangpengzhao Aug 15, 2017

caesarxuchao Aug 17, 2017

thockin commented Aug 17, 2017

dixudx commented Aug 17, 2017

dixudx commented Aug 18, 2017 •

edited

Loading

vishh commented Aug 18, 2017

k8s-github-robot commented Aug 18, 2017

k8s-github-robot commented Aug 18, 2017

fix GPU resource validation that incorrectly allows zero limits #50218

fix GPU resource validation that incorrectly allows zero limits #50218

Conversation

dixudx commented Aug 7, 2017

k8s-ci-robot commented Aug 7, 2017

k8s-ci-robot commented Aug 7, 2017

dixudx commented Aug 7, 2017

vishh commented Aug 7, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

thockin commented Aug 7, 2017 via email

xiangpengzhao commented Aug 7, 2017

dixudx commented Aug 7, 2017

vishh commented Aug 7, 2017

vishh commented Aug 7, 2017

Choose a reason for hiding this comment

dixudx commented Aug 9, 2017

cmluciano commented Aug 9, 2017

dixudx commented Aug 10, 2017

dixudx commented Aug 10, 2017

vishh commented Aug 10, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

thockin commented Aug 17, 2017

dixudx commented Aug 17, 2017

dixudx commented Aug 18, 2017 • edited Loading

vishh commented Aug 18, 2017

k8s-github-robot commented Aug 18, 2017

k8s-github-robot commented Aug 18, 2017

dixudx commented Aug 18, 2017 •

edited

Loading