Skip to content

Latest commit

 

History

History

gpu-provisioner

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 

Karpenter Azure provider gpu-provisioner

Version: 0.2.0 Type: application AppVersion: 0.2.0

A Helm chart for gpu-provisioner

Installing the Chart

To install the chart with the release name gpu-provisioner:

export CHART_VERSION=0.3.0
export CLUSTER_NAME=my-cluster
export AZURE_RESOURCE_GROUP=my-rg
export AZURE_SUBSCRIPTION_ID=my-subscription-id
export MSI_NAME=gpuIdentity

az identity create --name $MSI_NAME --resource-group $CLUSTER_NAME

./hack/deploy/configure-helm-values.sh $CLUSTER_NAME $AZURE_RESOURCE_GROUP $MSI_NAME

helm install gpu-provisioner \
      https://github.com/Azure/gpu-provisioner/raw/gh-pages/charts/gpu-provisioner-$CHART_VERSION.tgz \
      --values gpu-provisioner-values.yaml --namespace gpu-provisioner --create-namespace --wait
make az-federated-credential

Values

Key Type Default Description
additionalAnnotations object {} Additional annotations to add into metadata.
additionalLabels object {} Additional labels to add into metadata.
affinity object {"nodeAffinity":{"requiredDuringSchedulingIgnoredDuringExecution":{"nodeSelectorTerms":[{"matchExpressions":[{"key":"karpenter.sh/provisioner-name","operator":"DoesNotExist"}]}]}}} Affinity rules for scheduling the pod.
controller.env list [] Additional environment variables for the controller pod.
controller.errorOutputPaths list ["stderr"] Controller errorOutputPaths - default to stderr only
controller.extraVolumeMounts list [] Additional volumeMounts for the controller pod.
controller.image.repository string mcr.microsoft.com/aks/kaito/gpu-provisioner
controller.image.tag string 0.2.0
controller.logEncoding string "" Controller log encoding, defaults to the global log encoding
controller.logLevel string "" Controller log level, defaults to the global log level
controller.outputPaths list ["stdout"] Controller outputPaths - default to stdout only
controller.resources object {"limits":{"cpu":1,"memory":"1Gi"},"requests":{"cpu":1,"memory":"1Gi"}} Resources for the controller pod.
controller.securityContext object {} SecurityContext for the controller container.
controller.sidecarContainer object {} Additional sideCarContainer config - this will also inherit volume mounts from deployment
dnsConfig object {} Configure DNS Config for the pod
dnsPolicy string "Default" Configure the DNS Policy for the pod
extraVolumes list [] Additional volumes for the pod.
fullnameOverride string "" Overrides the chart's computed fullname.
hostNetwork bool false Bind the pod to the host network. This is required when using a custom CNI.
imagePullPolicy string "IfNotPresent" Image pull policy for Docker images.
imagePullSecrets list [] Image pull secrets for Docker images.
logEncoding string "console" Global log encoding
logLevel string "debug" Global log level
nameOverride string "" Overrides the chart's name.
nodeSelector object {"kubernetes.io/os":"linux"} Node selectors to schedule the pod to nodes with labels.
podAnnotations object {} Additional annotations for the pod.
podDisruptionBudget.maxUnavailable int 1
podDisruptionBudget.name string "karpenter"
podLabels object {} Additional labels for the pod.
podSecurityContext object {"fsGroup":1000} SecurityContext for the pod.
priorityClassName string "system-cluster-critical" PriorityClass name for the pod.
replicas int 1 Number of replicas.
revisionHistoryLimit int 10 The number of old ReplicaSets to retain to allow rollback.
serviceAccount.annotations object {} Additional annotations for the ServiceAccount.
serviceAccount.create bool true Specifies if a ServiceAccount should be created.
serviceAccount.name string "" The name of the ServiceAccount to use. If not set and create is true, a name is generated using the fullname template.
serviceMonitor.additionalLabels object {} Additional labels for the ServiceMonitor.
serviceMonitor.enabled bool false Specifies whether a ServiceMonitor should be created.
serviceMonitor.endpointConfig object {} Endpoint configuration for the ServiceMonitor.
settings object {"azure":{"clusterName":"","tags":null}} Global Settings to configure Karpenter
settings.azure object {"clusterName":"","tags":null} Azure-specific configuration values
settings.azure.clusterName string "" Cluster name.
settings.azure.tags string nil The global tags to use on all Azure infrastructure resources (launch templates, instances, SQS queue, etc.)
strategy object {"rollingUpdate":{"maxUnavailable":1}} Strategy for updating the pod.
terminationGracePeriodSeconds string nil Override the default termination grace period for the pod.
tolerations list [{"key":"CriticalAddonsOnly","operator":"Exists"}] Tolerations to allow the pod to be scheduled to nodes with taints.
topologySpreadConstraints list [{"maxSkew":1,"topologyKey":"topology.kubernetes.io/zone","whenUnsatisfiable":"ScheduleAnyway"}] topologySpreadConstraints to increase the controller resilience