Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AWS Feature: GPU support #23587

Closed
justinsb opened this issue Mar 29, 2016 · 19 comments
Closed

AWS Feature: GPU support #23587

justinsb opened this issue Mar 29, 2016 · 19 comments
Assignees
Labels
kind/feature Categorizes issue or PR as related to a new feature. priority/backlog Higher priority than priority/awaiting-more-evidence. sig/node Categorizes an issue or PR as relevant to SIG Node.
Milestone

Comments

@justinsb
Copy link
Member

GPU support, both assigning only to nodes that support GPUs & ensuring that we don't overcommit GPUs.

@justinsb justinsb added area/platform/aws kind/feature Categorizes issue or PR as related to a new feature. labels Mar 29, 2016
@justinsb justinsb added this to the v1.3 milestone Mar 29, 2016
@justinsb
Copy link
Member Author

@therc you said you would work on this :-)

@zhouhaibing089
Copy link
Contributor

@chengyli to know this.

@a-robinson
Copy link
Contributor

Cross reference #19049, #17035,

@a-robinson a-robinson added sig/node Categorizes an issue or PR as relevant to SIG Node. team/control-plane labels Mar 31, 2016
@brandoncole
Copy link

Watching this closely, this will definitely help us! Kudos to taking this on for v1.3!

@therc
Copy link
Member

therc commented Apr 5, 2016

I'm trying to implement this through Docker volume plugins, since that's already been requested in #16405 and would reduce the amount of custom code (we still need to pass --device somehow... docs coming).

@therc
Copy link
Member

therc commented Apr 5, 2016

@brandoncole what instance types do you use? For v0 I am targeting g2.2xl.

@gopinatht
Copy link

@therc Answering on behalf of @brandoncole (I work with him). We are using g2.2xl so that works perfectly for us.

@brandoncole
Copy link

@therc We would be pretty happy with g2.2xl to start with. We're also running Kubernetes on a pretty large scale internally with machines that have multiple NVidia GPUs which we'd love to take advantage of and schedule accordingly as well eventually. @gopinatht is our main guy for this! 👍

@Hui-Zhi
Copy link
Contributor

Hui-Zhi commented Apr 6, 2016

@therc Docker already support GPU, but the K8s doesn't have the "--device" ability, could you let know the details about "implement this through Docker volume plugins"? I enabled the "--device" in my repo, but the code still need to refine. And so far it works with NVIDIA GPU. Have tested that.

@therc
Copy link
Member

therc commented Apr 7, 2016

@Hui-Zhi I'm almost done with the design doc. I'll send it out tomorrow. You're right that --device is needed. That's the most invasive set of changes. I refer to my work as v0, as it involves few changes to the Kubernetes daemons and keeps most of the new code in external ones. Your work is v1, where the scheduling and runtime configuration are more sophisticated and robust, plus they live inside kube-scheduler and kubelet.

jessfraz pushed a commit to jessfraz/kubernetes that referenced this issue May 12, 2016
Automatic merge from submit-queue

WIP v0 NVIDIA GPU support

```release-note
* Alpha support for scheduling pods on machines with NVIDIA GPUs whose kubelets use the `--experimental-nvidia-gpus` flag, using the alpha.kubernetes.io/nvidia-gpu resource 
```

Implements part of kubernetes#24071 for  kubernetes#23587

I am not familiar with the scheduler enough to know what to do with the scores. Mostly punting for now.

Missing items from the implementation plan: limitranger, rkt support, kubectl
support and docs

cc @erictune @davidopp @dchen1107 @vishh @Hui-Zhi @gopinatht
@davidopp
Copy link
Member

@justinsb Now that our v0 GPU support has merged, are there any AWS-specific bits needed to enable it on AWS?

@Hui-Zhi
Copy link
Contributor

Hui-Zhi commented May 27, 2016

I filed a new issue to collect the details needed and risks for NVIDIA GPU workloads, also have some tech needed for kubelet. @davidopp Could you give some feedback on it?
#25557

@dchen1107 dchen1107 added the priority/backlog Higher priority than priority/awaiting-more-evidence. label May 27, 2016
@justinsb justinsb modified the milestones: v1.4, v1.3 Jun 3, 2016
@justinsb
Copy link
Member Author

justinsb commented Jun 3, 2016

Moving to 1.4

@justinsb
Copy link
Member Author

justinsb commented Jun 3, 2016

Actually @therc perhaps you can comment on the GPU support that is merged and what (if anything) we need to do additionally for AWS?

@goltermann goltermann modified the milestones: v1.5, v1.4 Sep 6, 2016
@justinsb justinsb self-assigned this Nov 15, 2016
@dims
Copy link
Member

dims commented Nov 17, 2016

This needs to be triaged as a release-blocker or not for 1.5

@dims
Copy link
Member

dims commented Nov 18, 2016

@justinsb all issues must be labeled either release blocker or non release blocking by end of day 18 November 2016 PST. (or please move it to 1.6) cc @kubernetes/sig-aws

@calebamiles
Copy link
Contributor

@justinsb, how do you feel about closing this issue or moving it out of the 1.5 milestone because there have been no updates to this issue in quite some time.

cc: @saad-ali, @dims

@cmluciano
Copy link

Can this issue be closed since multi-GPU support is merged with #42116 ? I have a follow-up proposal for some other features kubernetes/community#414

@dchen1107
Copy link
Member

Close this one, and please address the rest enhancement separately. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. priority/backlog Higher priority than priority/awaiting-more-evidence. sig/node Categorizes an issue or PR as relevant to SIG Node.
Projects
None yet
Development

No branches or pull requests