-
You need to configure your node to utilize GPU. This can be done the following way:
- Install nvidia-docker2
- Connect to your MasterNode and set nvidia as the default run in
/etc/docker/daemon.json
:{ "default-runtime": "nvidia", "runtimes": { "nvidia": { "path": "/usr/bin/nvidia-container-runtime", "runtimeArgs": [] } } }
- After that deploy nvidia-daemon to kubernetes:
kubectl create -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/v1.11/nvidia-device-plugin.yml
-
NVIDIA GPUs can now be consumed via container level resource requirements using the resource name nvidia.com/gpu:
resources: limits: nvidia.com/gpu: 2 # requesting 2 GPUs
-
Building image. Each example has prebuilt images that are stored on google cloud resources (GCR). If you want to create your own image we recommend using dockerhub. Each example has its own Dockerfile that we strongly advise to use. To build your custom image follow instruction on TechRepublic.
-
To deploy your job we recommend using official kubeflow documentation. Each example has example yaml files for two versions of apis. Feel free to modify them, e.g. image or number of GPUs.
Note: PyTorch job doesn’t work in a user namespace by default because of Istio automatic sidecar injection. In order to get it running, it needs annotation sidecar.istio.io/inject: "false" to disable it for either PyTorch pods or namespace. For example:
template:
metadata:
annotations:
sidecar.istio.io/inject: "false"