-
Notifications
You must be signed in to change notification settings - Fork 40k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes to be committed: #21446
Changes to be committed: #21446
Conversation
Can one of the admins verify that this patch is reasonable to test? (reply "ok to test", or if you trust the user, reply "add to whitelist") If this message is too spammy, please complain to ixdy. |
1 similar comment
Can one of the admins verify that this patch is reasonable to test? (reply "ok to test", or if you trust the user, reply "add to whitelist") If this message is too spammy, please complain to ixdy. |
/cc @derekwaynecarr to track |
ok to test |
Will take a look after 1.2 push. |
Can one of the admins verify that this patch is reasonable to test? (reply "ok to test", or if you trust the user, reply "add to whitelist") If this message is too spammy, please complain to ixdy. |
Could you let me know when 1.2 push could be done? @davidopp |
Realistically, I will be able to take a look at this towards the middle/end of this week. Very sorry for the delay. |
cc/ @erictune |
Can one of the admins verify that this patch is reasonable to test in $JOB_NAME? (reply "ok to test", or if you trust the user, reply "add to whitelist") If this message is too spammy, please complain to ixdy. |
cc/ @dchen1107 |
@@ -0,0 +1,34 @@ | |||
# K8s support "/sys/fs/cgroup/devices" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
device cgroups is one of the access control mechanisms. I'd like to keep that a kubelet implementation detail. This PR and this proposal should center around GPU support.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What you said make sense, but I think different devices could have difference implementation in kubelet, absolutely, and this time I only want to enable the NVIDIA GPU. But regarding the changes in scheduler, should have an unified interface for all the devices. It kind of like enable all the devices by scheduler's perspective. So considering all changes in kubelet and scheduler to make GPU enabled, what should be the name for the changes? Do you have any ideas about it?
Could you add a section that explains what things are the responsibility of:
Specifically, classify these tasks in the above categories, or state if it is not in scope initially for this design.
I think probably 1 and 4 are the responsibility of the person who sets up the node. |
@erictune Sure, I'll add it later. |
I'm wondering if instead of GPUs and devices, we should just bite the bullet and have a generic resource, like @davidopp suggested in #19049 (comment). For example, I might want to have a VPN tunnel set up. Or a special iptables rule to intercept traffic to 169.254.169.254 and forward it to my container, which is listening on a hostport. These are all resources whose creation needs root-equivalent capabilities, which hopefully a container will lack, because the administrator has tightened things down. Then a resource plugin, after authentication and authorization, would mediate creation and mapping of the resources on behalf of the user. Administrators could define and implement helpers for all the custom resources they want. In addition to all that @davidopp identified for the GPU case, it'd be great to be able to pass user-defined data to the setup command. In my examples above, that would be the VPN tunnel configuration or the address that the iptables rule needs to intercept. GPUs would just be the first real-world use of the feature and would go a long way in proving that it works. |
@therc @davidopp I think in the final version of this proposal, the scheduler should have a generic resource, and these resource descriptions should be written in https://github.com/kubernetes/kubernetes/blob/master/docs/design/resources.md. But so far we just need to find a way to enable NVIDIA GPU in Kubernetes, and verify it to ensure it could work, also the changes should be approved by community. I think most part of my code are fine. BTW, treat GPU as a first-class integer resource is what I did in the code. I would like to implement this just use scheduler and kubelet "--device" ability to enable GPU directly, and the code is already there, but need to be refined. |
I am now going to look at both this PR and #24836 |
requests: | ||
cpu: 100m | ||
memory: 100Mi | ||
nvidiagpu: 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer how it is alpha.kubernetes.io/nvidia-gpu
, in #24836.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now. Makes it clear feature is alpha. Later, maybe we rename it to alpha.kubernetes.io/nvidia-gpu
, or maybe we find a way to abstract GPU vendors and can call it "gpu"
This has some good stuff in it. However #24836, while very similar, is slightly more complete. So I am inclined to take that one. Your comments on that PR wanted. |
Thanks for your pull request. It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). 📝 Please visit https://cla.developers.google.com/ to sign. Once you've signed, please reply here (e.g.
|
This change is