Skip to content

Commit

Permalink
Enable e2e test against universal operator (kubeflow#1336)
Browse files Browse the repository at this point in the history
* Fix make test failures

Since we’ve moved the crd folder, we need to fix relative path correspondingly

* Fix openAPIV3Schema validation issue

See kubeflow#1324 for more details.

* Clean up kubebuilder manifest

1. Remove unused webhook and cainjection patches in crd
2. Not sure why role.yaml is missing. I add it back manually
3. Comment out unused components

* Add e2e test against universal training operator

1. Copy original ksonnet workflow with minor changes to build/install operator
2. Add scripts to setup universal operator using kustomization files
3. Add new test entry in prow_config.yaml

* Fix invalid_job test case

Original test case is not working anymore because we have validations on custom resource. I change to catch the error message to check is validation exception is part of it.

* Remove unless files to reduce commit size

* Move main.go to cmd folder

* Remove workflow_v2 and change original test instead

Since we already have v1.1-branch, we don’t have to make two test pipeline working at the same time. v1.1-branch will still run legacy pipeline and new test for universal training operator will run against master

* Add Optional tag in RunPolicy

This makes sure CRD remove required check for RunPolicy.

* Fix test failures

1. Change PULL_BASE_SHA to PULL_PULL_SHA
2. Explicitly set defaults for jobs. Somehow, scheme registration doesn’t work expectedly
  • Loading branch information
Jeffwan authored Aug 6, 2021
1 parent 00da827 commit 3320f7f
Show file tree
Hide file tree
Showing 42 changed files with 229 additions and 295 deletions.
8 changes: 4 additions & 4 deletions Makefile
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@

# Image URL to use all building/pushing image targets
IMG ?= controller:latest
IMG ?= kubeflow/training-operator:latest
# Produce CRDs that work back to Kubernetes 1.11 (no version conversion)
CRD_OPTIONS ?= "crd:trivialVersions=true,preserveUnknownFields=false"

Expand Down Expand Up @@ -58,13 +58,13 @@ test: manifests generate fmt vet ## Run tests.
##@ Build

build: generate fmt vet ## Build manager binary.
go build -o bin/manager main.go
go build -o bin/manager cmd/training-operator.v1/main.go

run: manifests generate fmt vet ## Run a controller from your host.
go run ./main.go
go run ./cmd/training-operator.v1/main.go

docker-build: test ## Build docker image with the manager.
docker build -t ${IMG} .
docker build -t ${IMG} -f build/images/training-operator/Dockerfile .

docker-push: ## Push docker image with the manager.
docker push ${IMG}
Expand Down
23 changes: 23 additions & 0 deletions build/images/training-operator/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
# Build the manager binary
FROM golang:1.14.9 as builder

WORKDIR /workspace
# Copy the Go Modules manifests
COPY go.mod go.mod
COPY go.sum go.sum
# cache deps before building and copying source so that we don't need to re-download as much
# and so that source changes don't invalidate our downloaded layer
RUN go mod download

# Copy the go source
COPY . .

# Build
RUN CGO_ENABLED=0 GOOS=linux GOARCH=amd64 GO111MODULE=on go build -a -o manager cmd/training-operator.v1/main.go

# Use distroless as minimal base image to package the manager binary
# Refer to https://github.com/GoogleContainerTools/distroless for more details
FROM gcr.io/distroless/static:latest
WORKDIR /
COPY --from=builder /workspace/manager .
ENTRYPOINT ["/manager"]
File renamed without changes.
1 change: 0 additions & 1 deletion config/crd/bases/kubeflow.org_pytorchjobs.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,6 @@ spec:
submits requests to. Cannot be updated. In CamelCase. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds'
type: string
metadata:
description: Standard Kubernetes object's metadata.
type: object
spec:
description: Specification of the desired state of the PyTorchJob.
Expand Down
1 change: 0 additions & 1 deletion config/crd/bases/kubeflow.org_tfjobs.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,6 @@ spec:
submits requests to. Cannot be updated. In CamelCase. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds'
type: string
metadata:
description: Standard Kubernetes object's metadata.
type: object
spec:
description: Specification of the desired state of the TFJob.
Expand Down
17 changes: 0 additions & 17 deletions config/crd/kustomization.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -8,23 +8,6 @@ resources:
- bases/kubeflow.org_mxjobs.yaml
#+kubebuilder:scaffold:crdkustomizeresource

patchesStrategicMerge:
# [WEBHOOK] To enable webhook, uncomment all the sections with [WEBHOOK] prefix.
# patches here are for enabling the conversion webhook for each CRD
#- patches/webhook_in_xgboostjobs.yaml
#- patches/webhook_in_pytorchjobs.yaml
#- patches/webhook_in_tfjobs.yaml
#- patches/webhook_in_mxjobs.yaml
#+kubebuilder:scaffold:crdkustomizewebhookpatch

# [CERTMANAGER] To enable webhook, uncomment all the sections with [CERTMANAGER] prefix.
# patches here are for enabling the CA injection for each CRD
#- patches/cainjection_in_xgboostjobs.yaml
#- patches/cainjection_in_pytorchjobs.yaml
#- patches/cainjection_in_tfjobs.yaml
#- patches/cainjection_in_mxjobs.yaml
#+kubebuilder:scaffold:crdkustomizecainjectionpatch

# the following config is for teaching kustomize how to do kustomization for CRDs.
configurations:
- kustomizeconfig.yaml
24 changes: 12 additions & 12 deletions config/crd/kustomizeconfig.yaml
Original file line number Diff line number Diff line change
@@ -1,19 +1,19 @@
# This file is for teaching kustomize how to substitute name and namespace reference in CRD
nameReference:
- kind: Service
version: v1
fieldSpecs:
- kind: CustomResourceDefinition
- kind: Service
version: v1
group: apiextensions.k8s.io
path: spec/conversion/webhook/clientConfig/service/name
fieldSpecs:
- kind: CustomResourceDefinition
version: v1
group: apiextensions.k8s.io
path: spec/conversion/webhook/clientConfig/service/name

namespace:
- kind: CustomResourceDefinition
version: v1
group: apiextensions.k8s.io
path: spec/conversion/webhook/clientConfig/service/namespace
create: false
- kind: CustomResourceDefinition
version: v1
group: apiextensions.k8s.io
path: spec/conversion/webhook/clientConfig/service/namespace
create: false

varReference:
- path: metadata/annotations
- path: metadata/annotations
7 changes: 0 additions & 7 deletions config/crd/patches/cainjection_in_mxjobs.yaml

This file was deleted.

7 changes: 0 additions & 7 deletions config/crd/patches/cainjection_in_pytorchjobs.yaml

This file was deleted.

7 changes: 0 additions & 7 deletions config/crd/patches/cainjection_in_tfjobs.yaml

This file was deleted.

7 changes: 0 additions & 7 deletions config/crd/patches/cainjection_in_xgboostjobs.yaml

This file was deleted.

14 changes: 0 additions & 14 deletions config/crd/patches/webhook_in_mxjobs.yaml

This file was deleted.

16 changes: 0 additions & 16 deletions config/crd/patches/webhook_in_pytorchjobs.yaml

This file was deleted.

14 changes: 0 additions & 14 deletions config/crd/patches/webhook_in_tfjobs.yaml

This file was deleted.

14 changes: 0 additions & 14 deletions config/crd/patches/webhook_in_xgboostjobs.yaml

This file was deleted.

67 changes: 2 additions & 65 deletions config/default/kustomization.yaml
Original file line number Diff line number Diff line change
@@ -1,74 +1,11 @@
# Adds namespace to all resources.
namespace: tf-operator-system

# Value of this field is prepended to the
# names of all resources, e.g. a deployment named
# "wordpress" becomes "alices-wordpress".
# Note that it should also match with the prefix (text before '-') of the namespace
# field above.
namePrefix: tf-operator-
namespace: kubeflow

# Labels to add to all resources and selectors.
#commonLabels:
# someName: someValue

bases:
resources:
- ../crd
- ../rbac
- ../manager
# [WEBHOOK] To enable webhook, uncomment all the sections with [WEBHOOK] prefix including the one in
# crd/kustomization.yaml
#- ../webhook
# [CERTMANAGER] To enable cert-manager, uncomment all sections with 'CERTMANAGER'. 'WEBHOOK' components are required.
#- ../certmanager
# [PROMETHEUS] To enable prometheus monitor, uncomment all sections with 'PROMETHEUS'.
#- ../prometheus

patchesStrategicMerge:
# Protect the /metrics endpoint by putting it behind auth.
# If you want your controller-manager to expose the /metrics
# endpoint w/o any authn/z, please comment the following line.
- manager_auth_proxy_patch.yaml

# Mount the controller config file for loading manager configurations
# through a ComponentConfig type
#- manager_config_patch.yaml

# [WEBHOOK] To enable webhook, uncomment all the sections with [WEBHOOK] prefix including the one in
# crd/kustomization.yaml
#- manager_webhook_patch.yaml

# [CERTMANAGER] To enable cert-manager, uncomment all sections with 'CERTMANAGER'.
# Uncomment 'CERTMANAGER' sections in crd/kustomization.yaml to enable the CA injection in the admission webhooks.
# 'CERTMANAGER' needs to be enabled to use ca injection
#- webhookcainjection_patch.yaml

# the following config is for teaching kustomize how to do var substitution
vars:
# [CERTMANAGER] To enable cert-manager, uncomment all sections with 'CERTMANAGER' prefix.
#- name: CERTIFICATE_NAMESPACE # namespace of the certificate CR
# objref:
# kind: Certificate
# group: cert-manager.io
# version: v1
# name: serving-cert # this name should match the one in certificate.yaml
# fieldref:
# fieldpath: metadata.namespace
#- name: CERTIFICATE_NAME
# objref:
# kind: Certificate
# group: cert-manager.io
# version: v1
# name: serving-cert # this name should match the one in certificate.yaml
#- name: SERVICE_NAMESPACE # namespace of the service
# objref:
# kind: Service
# version: v1
# name: webhook-service
# fieldref:
# fieldpath: metadata.namespace
#- name: SERVICE_NAME
# objref:
# kind: Service
# version: v1
# name: webhook-service
26 changes: 0 additions & 26 deletions config/default/manager_auth_proxy_patch.yaml

This file was deleted.

20 changes: 13 additions & 7 deletions config/manager/kustomization.yaml
Original file line number Diff line number Diff line change
@@ -1,10 +1,16 @@
resources:
- manager.yaml

generatorOptions:
disableNameSuffixHash: true

configMapGenerator:
- name: manager-config
files:
- controller_manager_config.yaml
#generatorOptions:
# disableNameSuffixHash: true
#
#configMapGenerator:
#- files:
# - controller_manager_config.yaml
# name: manager-config
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
images:
- name: kubeflow/training-operator
newName: kubeflow/training-operator
newTag: latest
25 changes: 13 additions & 12 deletions config/manager/manager.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,34 +2,35 @@ apiVersion: v1
kind: Namespace
metadata:
labels:
control-plane: controller-manager
name: system
control-plane: kubeflow-training-operator
name: kubeflow
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: controller-manager
name: training-operator
namespace: system
labels:
control-plane: controller-manager
control-plane: kubeflow-training-operator
spec:
selector:
matchLabels:
control-plane: controller-manager
control-plane: kubeflow-training-operator
replicas: 1
template:
metadata:
labels:
control-plane: controller-manager
control-plane: kubeflow-training-operator
spec:
securityContext:
runAsNonRoot: true
# securityContext:
# runAsNonRoot: true
containers:
- command:
- /manager
args:
- --leader-elect
image: controller:latest
# disable leader-elect now
# args:
# - --leader-elect
image: kubeflow/training-operator:v1.0.0
name: manager
securityContext:
allowPrivilegeEscalation: false
Expand All @@ -52,5 +53,5 @@ spec:
requests:
cpu: 100m
memory: 20Mi
serviceAccountName: controller-manager
serviceAccountName: training-operator-service-account
terminationGracePeriodSeconds: 10
Loading

0 comments on commit 3320f7f

Please sign in to comment.