A Helm chart for gpu-provisioner
To install the chart with the release name gpu-provisioner
:
export CHART_VERSION=0.3.0
export CLUSTER_NAME=my-cluster
export AZURE_RESOURCE_GROUP=my-rg
export AZURE_SUBSCRIPTION_ID=my-subscription-id
export MSI_NAME=gpuIdentity
az identity create --name $MSI_NAME --resource-group $CLUSTER_NAME
./hack/deploy/configure-helm-values.sh $CLUSTER_NAME $AZURE_RESOURCE_GROUP $MSI_NAME
helm install gpu-provisioner \
https://github.com/Azure/gpu-provisioner/raw/gh-pages/charts/gpu-provisioner-$CHART_VERSION.tgz \
--values gpu-provisioner-values.yaml --namespace gpu-provisioner --create-namespace --wait
make az-federated-credential
Key | Type | Default | Description |
---|---|---|---|
additionalAnnotations | object | {} |
Additional annotations to add into metadata. |
additionalLabels | object | {} |
Additional labels to add into metadata. |
affinity | object | {"nodeAffinity":{"requiredDuringSchedulingIgnoredDuringExecution":{"nodeSelectorTerms":[{"matchExpressions":[{"key":"karpenter.sh/provisioner-name","operator":"DoesNotExist"}]}]}}} |
Affinity rules for scheduling the pod. |
controller.env | list | [] |
Additional environment variables for the controller pod. |
controller.errorOutputPaths | list | ["stderr"] |
Controller errorOutputPaths - default to stderr only |
controller.extraVolumeMounts | list | [] |
Additional volumeMounts for the controller pod. |
controller.image.repository | string | mcr.microsoft.com/aks/kaito/gpu-provisioner |
|
controller.image.tag | string | 0.2.0 |
|
controller.logEncoding | string | "" |
Controller log encoding, defaults to the global log encoding |
controller.logLevel | string | "" |
Controller log level, defaults to the global log level |
controller.outputPaths | list | ["stdout"] |
Controller outputPaths - default to stdout only |
controller.resources | object | {"limits":{"cpu":1,"memory":"1Gi"},"requests":{"cpu":1,"memory":"1Gi"}} |
Resources for the controller pod. |
controller.securityContext | object | {} |
SecurityContext for the controller container. |
controller.sidecarContainer | object | {} |
Additional sideCarContainer config - this will also inherit volume mounts from deployment |
dnsConfig | object | {} |
Configure DNS Config for the pod |
dnsPolicy | string | "Default" |
Configure the DNS Policy for the pod |
extraVolumes | list | [] |
Additional volumes for the pod. |
fullnameOverride | string | "" |
Overrides the chart's computed fullname. |
hostNetwork | bool | false |
Bind the pod to the host network. This is required when using a custom CNI. |
imagePullPolicy | string | "IfNotPresent" |
Image pull policy for Docker images. |
imagePullSecrets | list | [] |
Image pull secrets for Docker images. |
logEncoding | string | "console" |
Global log encoding |
logLevel | string | "debug" |
Global log level |
nameOverride | string | "" |
Overrides the chart's name. |
nodeSelector | object | {"kubernetes.io/os":"linux"} |
Node selectors to schedule the pod to nodes with labels. |
podAnnotations | object | {} |
Additional annotations for the pod. |
podDisruptionBudget.maxUnavailable | int | 1 |
|
podDisruptionBudget.name | string | "karpenter" |
|
podLabels | object | {} |
Additional labels for the pod. |
podSecurityContext | object | {"fsGroup":1000} |
SecurityContext for the pod. |
priorityClassName | string | "system-cluster-critical" |
PriorityClass name for the pod. |
replicas | int | 1 |
Number of replicas. |
revisionHistoryLimit | int | 10 |
The number of old ReplicaSets to retain to allow rollback. |
serviceAccount.annotations | object | {} |
Additional annotations for the ServiceAccount. |
serviceAccount.create | bool | true |
Specifies if a ServiceAccount should be created. |
serviceAccount.name | string | "" |
The name of the ServiceAccount to use. If not set and create is true, a name is generated using the fullname template. |
serviceMonitor.additionalLabels | object | {} |
Additional labels for the ServiceMonitor. |
serviceMonitor.enabled | bool | false |
Specifies whether a ServiceMonitor should be created. |
serviceMonitor.endpointConfig | object | {} |
Endpoint configuration for the ServiceMonitor. |
settings | object | {"azure":{"clusterName":"","tags":null}} |
Global Settings to configure Karpenter |
settings.azure | object | {"clusterName":"","tags":null} |
Azure-specific configuration values |
settings.azure.clusterName | string | "" |
Cluster name. |
settings.azure.tags | string | nil |
The global tags to use on all Azure infrastructure resources (launch templates, instances, SQS queue, etc.) |
strategy | object | {"rollingUpdate":{"maxUnavailable":1}} |
Strategy for updating the pod. |
terminationGracePeriodSeconds | string | nil |
Override the default termination grace period for the pod. |
tolerations | list | [{"key":"CriticalAddonsOnly","operator":"Exists"}] |
Tolerations to allow the pod to be scheduled to nodes with taints. |
topologySpreadConstraints | list | [{"maxSkew":1,"topologyKey":"topology.kubernetes.io/zone","whenUnsatisfiable":"ScheduleAnyway"}] |
topologySpreadConstraints to increase the controller resilience |