-
Notifications
You must be signed in to change notification settings - Fork 40.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix GPU resource validation that incorrectly allows zero limits #50218
fix GPU resource validation that incorrectly allows zero limits #50218
Conversation
@dixudx: GitHub didn't allow me to request PR reviews from the following users: jiayingz. Note that only kubernetes members can review this PR, and authors cannot review their own PRs. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Hi @dixudx. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
@dixudx I appreciate posting this PR. Your contributions are much welcome. In the future, can I request you to mention explicitly that you are working on a fix for an issue before you post a patch? |
pkg/api/v1/validation/validation.go
Outdated
@@ -60,6 +60,15 @@ func ValidateResourceRequirements(requirements *v1.ResourceRequirements, fldPath | |||
allErrs = append(allErrs, validateContainerResourceName(string(resourceName), fldPath)...) | |||
// Validate resource quantity. | |||
allErrs = append(allErrs, ValidateResourceQuantityValue(string(resourceName), quantity, fldPath)...) | |||
|
|||
// For GPUs, not only requests can't exceed limits, they also can't be lower, i.e. must be equal. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is not appropriate to the check down below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vishh Updated. PTAL. Thanks.
Also many thanks for your kind reminder.
3a39bbd
to
e150ea5
Compare
pkg/api/v1/validation/validation.go
Outdated
@@ -60,6 +60,15 @@ func ValidateResourceRequirements(requirements *v1.ResourceRequirements, fldPath | |||
allErrs = append(allErrs, validateContainerResourceName(string(resourceName), fldPath)...) | |||
// Validate resource quantity. | |||
allErrs = append(allErrs, ValidateResourceQuantityValue(string(resourceName), quantity, fldPath)...) | |||
|
|||
// For GPUs, limits has to be set equal to requests. | |||
if resourceName == v1.ResourceNvidiaGPU { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since limits has to be set equal to requests for GPU, would it be better to set limits defaulting to requests if limits not configured? @vishh
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, we don't really have any precedent for request/limit defaulting per-resource. I sort of wish we did. What do you think @vishh
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC, for resources like CPU/memory, if their requests are not configured, they will be defaulted to the limits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Every line of code we introduce that's per resource makes the system harder to explain.
GPUs will be kicked out to an extension sometime soon. So I don't think its worth expanding the validation logic just for GPUs.
AFAIK, the default for both CPU and RAM if limit is unspecified is "inf".
https://kubernetes.io/docs/tasks/configure-pod-container/assign-cpu-ram-container/
Why can't GPU just ignore limit - it's kind of semantically silly?
…On Sun, Aug 6, 2017 at 9:18 PM, Peter (XiangPeng) Zhao < ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In pkg/api/v1/validation/validation.go
<#50218 (comment)>
:
> @@ -60,6 +60,15 @@ func ValidateResourceRequirements(requirements *v1.ResourceRequirements, fldPath
allErrs = append(allErrs, validateContainerResourceName(string(resourceName), fldPath)...)
// Validate resource quantity.
allErrs = append(allErrs, ValidateResourceQuantityValue(string(resourceName), quantity, fldPath)...)
+
+ // For GPUs, limits has to be set equal to requests.
+ if resourceName == v1.ResourceNvidiaGPU {
IIRC, for resources like CPU/memory, if their requests are not configured,
they will be defaulted to the limits.
—
You are receiving this because you were assigned.
Reply to this email directly, view it on GitHub
<#50218 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AFVgVPVF-4_5j8MVaC6V6ZJZ_GAwb_mEks5sVpAIgaJpZM4Ou7Ip>
.
|
Correct. But what I meant is that if limit is specified but request is unspecified, request will default to limit. @vishh right? |
Exactly. |
@thockin specifying request for GPUs is silly as well since GPUs (or any non shared resource) cannot exceed their requests. |
@dixudx We need unit tests for this PR. |
@@ -60,6 +60,15 @@ func ValidateResourceRequirements(requirements *v1.ResourceRequirements, fldPath | |||
allErrs = append(allErrs, validateContainerResourceName(string(resourceName), fldPath)...) | |||
// Validate resource quantity. | |||
allErrs = append(allErrs, ValidateResourceQuantityValue(string(resourceName), quantity, fldPath)...) | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel the check under limits
validation can be merged into this logic.
- If limits are specified and requests aren't specified, requests will equal limits. The logic here will be hit.
- If limits aren't specified and request is, then this logic will be hit as well.
- If neither is specified, we do not care.
e150ea5
to
a7deba8
Compare
/ok-to-test |
/retest |
/lgtm @thockin This PR is ready for your review (again) |
} | ||
for resourceName, quantity := range requirements.Requests { | ||
fldPath := reqPath.Key(string(resourceName)) | ||
// Validate resource name. | ||
allErrs = append(allErrs, validateContainerResourceName(string(resourceName), fldPath)...) | ||
// Validate resource quantity. | ||
allErrs = append(allErrs, ValidateResourceQuantityValue(string(resourceName), quantity, fldPath)...) | ||
|
||
// Check that request <= limit. | ||
limitQuantity, exists := requirements.Limits[resourceName] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this function exist in 2 places? @caesarxuchao your fingerprints are on the commit that copied these functions into versioned. Why? This is a nightmare for updates like this - who would expect this code to be in two places?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, there are also other functions which exist in two places. We DO NOT want to maintain things this way.
See one example:
https://github.com/kubernetes/kubernetes/blob/master/pkg/api/v1/service/util.go
https://github.com/kubernetes/kubernetes/blob/master/pkg/api/service/util.go
They almost have the same functions. We'd better only keep one copy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tim, I sent an issue to track this. #50653
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As per #50653 (comment) by @liggitt , we should keep them as-is...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with liggitt's comment. I'd like to explore if we can make admission controllers operate on versioned objects, because we want to make the life easier for developers of thirdparty admission webhooks.
/approve |
6a177a8
to
75258b2
Compare
/test pull-kubernetes-e2e-gce-etcd3 |
ping @vishh @thockin @caesarxuchao Rebased. PTAL. Need your |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: dixudx, thockin, vishh Associated issue: 50182 The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these OWNERS Files:
You can indicate your approval by writing |
Automatic merge from submit-queue |
What this PR does / why we need it:
The validation logic for GPUs is not run if limits is not set for GPUs.
We need to check limits equals requests even if just request is set for GPUs.
Which issue this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close that issue when PR gets merged): fixes #50182Special notes for your reviewer:
/assign @vishh
/cc @jiayingz
Release note: