add e2e test scripts in prow #19

everpeace · 2018-08-07T03:23:39Z

fixes #3

@jlewi @gaocegege Would you mind reviewing this PR?? I added e2e test with prow by referring pytorch repo.

This change is

everpeace · 2018-08-07T03:49:09Z

jlewi

Reviewable status: 0 of 18 files reviewed, 4 unresolved discussions (waiting on @everpeace, @disktnk, @gaocegege, and @jlewi)

scripts/create-cluster.sh, line 17 at r1 (raw file):

# limitations under the License.

# This shell script is used to build a cluster and create a namespace from our

Rather than creating a custom script can we reuse the existing scripts to deploy Kubeflow?
I think the preferred solution would be to wait for
kubeflow/kubeflow#1308
To be submitted

That will create the script kfctl.sh. You can use that to deploy Kubeflow creating a GKE cluster. You can then install chainer and deploy it.

scripts/create-cluster.sh, line 35 at r1 (raw file):

gcloud --project ${PROJECT} beta container clusters create ${CLUSTER_NAME} \
    --zone ${ZONE} \
    --accelerator type=nvidia-tesla-k80,count=2 \

Are you actually testing GPUs? And do you really need two of them?

scripts/run-test.sh, line 36 at r1 (raw file):

echo "Activating service-account"
gcloud auth activate-service-account --key-file=${GOOGLE_APPLICATION_CREDENTIALS}

We should use kfctl.sh because this is effectively duplicating the logic in kfctl.sh

scripts/run-test.sh, line 67 at r1 (raw file):

ks generate chainer-job-simple ${MNIST_TEST} \
  --image=${REGISTRY}/${MNIST_TEST}:${VERSION} \
  --gpus=0 \

Looks like you aren't using GPUs so lets not create GPU clusters.

jlewi · 2018-08-07T13:01:40Z

I opened kubeflow/kubeflow#1325 to make it easier to share code across E2E tests.

everpeace

Reviewable status: 0 of 18 files reviewed, 4 unresolved discussions (waiting on @everpeace, @disktnk, and @gaocegege)

scripts/create-cluster.sh, line 17 at r1 (raw file):

Previously, jlewi (Jeremy Lewi) wrote…

Rather than creating a custom script can we reuse the existing scripts to deploy Kubeflow?
I think the preferred solution would be to wait for
kubeflow/kubeflow#1308
To be submitted

That will create the script kfctl.sh. You can use that to deploy Kubeflow creating a GKE cluster. You can then install chainer and deploy it.

Thanks. I change this so to use kfctl.sh

scripts/create-cluster.sh, line 35 at r1 (raw file):

Previously, jlewi (Jeremy Lewi) wrote…

Are you actually testing GPUs? And do you really need two of them?

no. I will fix it.

jlewi · 2018-08-08T05:58:46Z

Thanks. Please let me know when this is ready for another look.

everpeace · 2018-08-08T08:04:19Z

@jlewi I changed the script to use kfctl.sh. Because I didn't run kfctl.sh actually, there might be some mistakes. Please take a look. Thanks in advance 🙇

jlewi

Reviewed 4 of 5 files at r2.
Reviewable status: 4 of 19 files reviewed, 5 unresolved discussions (waiting on @jlewi, @everpeace, @disktnk, and @gaocegege)

scripts/create-cluster.sh, line 35 at r2 (raw file):

kfctl

install?

scripts/kfctl-util.sh, line 24 at r2 (raw file):

    mkdir -p ${kfctl_dir}
    cd ${kfctl_dir}
    curl https://raw.githubusercontent.com/kubeflow/kubeflow/${kubeflow_version}/scripts/download.sh | sh

You shouldn't need to do this.

If you look at our other workflows the first step is checkout which checks out the code
https://github.com/kubeflow/kubeflow/blob/master/testing/workflows/components/workflows.libsonnet#L301

As part of that we can specify other repositories to check out and what version (E.g. Head). The code will get checked out to a shared NFS directory and you can then refer to it.

jlewi · 2018-08-08T21:46:41Z

It doesn't look like prow is triggering any tests. If you setup prow to run the presubmits then you will be able to verify as part of this PR that the tests are working correctly before you check them in.

everpeace

Reviewable status: 1 of 19 files reviewed, 5 unresolved discussions (waiting on @jlewi, @everpeace, @disktnk, and @gaocegege)

scripts/create-cluster.sh, line 17 at r1 (raw file):

Previously, everpeace (Shingo Omura) wrote…

Thanks. I change this so to use kfctl.sh

Done.

scripts/create-cluster.sh, line 35 at r2 (raw file):

Previously, jlewi (Jeremy Lewi) wrote…

kfctl
install?

thanks for the catch 🙇‍♂️

scripts/kfctl-util.sh, line 24 at r2 (raw file):

Previously, jlewi (Jeremy Lewi) wrote…

You shouldn't need to do this.

If you look at our other workflows the first step is checkout which checks out the code
https://github.com/kubeflow/kubeflow/blob/master/testing/workflows/components/workflows.libsonnet#L301

As part of that we can specify other repositories to check out and what version (E.g. Head). The code will get checked out to a shared NFS directory and you can then refer to it.

nice 👍 I'll switch to use checkout.sh.

scripts/run-test.sh, line 36 at r1 (raw file):

Previously, jlewi (Jeremy Lewi) wrote…

We should use kfctl.sh because this is effectively duplicating the logic in kfctl.sh

Done.

scripts/run-test.sh, line 67 at r1 (raw file):

Previously, jlewi (Jeremy Lewi) wrote…

Looks like you aren't using GPUs so lets not create GPU clusters.

Done.

everpeace

Reviewable status: 1 of 19 files reviewed, 5 unresolved discussions (waiting on @jlewi, @disktnk, and @gaocegege)

scripts/create-cluster.sh, line 35 at r2 (raw file):

Previously, everpeace (Shingo Omura) wrote…

thanks for the catch 🙇‍♂️

Done.

scripts/kfctl-util.sh, line 24 at r2 (raw file):

Previously, everpeace (Shingo Omura) wrote…

nice 👍 I'll switch to use checkout.sh.

Done.

everpeace · 2018-08-09T04:01:11Z

@jlewi I changed to use checkout.sh instead of downloading kfctl.sh and opened a PR in kubernetes/test-infra to activate prow in the repo. PTAL 🙇

jlewi · 2018-08-09T13:49:38Z

I approved the other PR. As soon as the test is running and passing I think we can submit this PR.

everpeace · 2018-08-13T14:51:49Z

@jlewi Hi, I'm now struggling on kfctl.sh init step in setup-cluster argo step because below error:

++ gcloud projects describe kubeflow-ci '--format=value(project_number)'
ERROR: (gcloud.projects.describe) User [593963025935-compute@developer.gserviceaccount.com] does not
 have permission to access project [kubeflow-ci] (or it may not exist): Request had insufficient aut
hentication scopes.
+ PROJECT_NUMBER=

ref: http://testing-argo.kubeflow.org/workflows/kubeflow-test-infra/kubeflow-chainer-operator-presubmit-e2e-19-8c9c3fc-7-577d?tab=workflow&nodeId=kubeflow-chainer-operator-presubmit-e2e-19-8c9c3fc-7-577d-3717692062&sidePanel=logs%3Akubeflow-chainer-operator-presubmit-e2e-19-8c9c3fc-7-577d-3717692062%3A

Would you mind giving me some advise??

jlewi · 2018-08-13T19:56:37Z

You're getting a permission error because you are relying on the VM service account which doesn't have sufficient scopes.

You need to do two things

You need to mount a secret into your steps and set the environment variable GOOGLE_APPLICATION_CREDNETIALS

Your argo workflow needs to define a volume corresponding to the secret in our kubeflow-ci cluster containing credentials for a GCP service account; see here
https://github.com/kubeflow/kubeflow/blob/master/testing/workflows/components/kfctl_test.jsonnet#L319

Your argo step template needs to add this secret as a volume mount like this
https://github.com/kubeflow/kubeflow/blob/master/testing/workflows/components/kfctl_test.jsonnet#L110

Then you need to set the environment variable GOOGLE_APPLICATION_CREDENTIALS to point to the secret like so
https://github.com/kubeflow/kubeflow/blob/master/testing/workflows/components/kfctl_test.jsonnet#L80

You need to activate the service account; by running the appropriate gcloud command see here
https://github.com/kubeflow/kubeflow/blob/master/testing/workflows/run.sh#L7

everpeace · 2018-08-14T15:41:13Z

@jlewi Thank you very much for helping me! I'm now trying to finish e2e test work. However, I found kfctl.sh delete platform does NOT delete platform correctly: kubeflow/kubeflow#1360

I tried my e2e test so many times without knowing it... And, I finally got quota error in my e2e test. sorry .... 🙇

http://testing-argo.kubeflow.org/workflows/kubeflow-test-infra/kubeflow-chainer-operator-presubmit-e2e-19-3580caf-26-1324?tab=workflow&nodeId=kubeflow-chainer-operator-presubmit-e2e-19-3580caf-26-1324-615064191&sidePanel=logs%3Akubeflow-chainer-operator-presubmit-e2e-19-3580caf-26-1324-615064191%3Amain

message: "{\"ResourceType\":\"container.v1.cluster\",\"ResourceErrorCode\":\"403\"\
   ,\"ResourceErrorMessage\":{\"code\":403,\"message\":\"Insufficient regional quota\
   \ to satisfy request: resource \\\"CPUS\\\": request requires '16.0' and is short\
   \ '8.0'. project has a quota of '320.0' with '8.0' available.\",\"status\":\"\
   PERMISSION_DENIED\",\"statusMessage\":\"Forbidden\",\"requestPath\":\"https://container.googleapis.com/v1/projects/kub
eflow-ci/zones/us-east1-d/clusters\"\
   ,\"httpMethod\":\"POST\"}}"

So, would you mind deleting old existing cluster??? 🙇 Thank you in advance.

// I will squash my small commits when e2e test passed.

everpeace · 2018-08-15T03:09:42Z

@jlewi Thanks again for cleaning up garbages 🙇 Finally this pr become to be ready for review!!

But, there is one concern in clean-up job. It seems that deleting service accounts failed by permission issue. Could you give advice how to delete sevice accounts of the job correctly??

http://testing-argo.kubeflow.org/workflows/kubeflow-test-infra/kubeflow-chainer-operator-presubmit-e2e-19-444c6e2-32-3021?tab=workflow&nodeId=kubeflow-chainer-operator-presubmit-e2e-19-444c6e2-32-3021-566898882&sidePanel=logs%3Akubeflow-chainer-operator-presubmit-e2e-19-444c6e2-32-3021-566898882%3Amain

...
+ SA=z-19-444c6e2-32-3021-user@kubeflow-ci.iam.gserviceaccount.com
+ python /mnt/test-data-volume/kubeflow-chainer-operator-presubmit-e2e-19-444c6e2-32-3021/src/kubeflow/kubeflow/scripts/gke/delete_role_bindings.py
 --project=kubeflow-ci --service_account=z-19-444c6e2-32-3021-user@kubeflow-ci.iam.gserviceaccount.com
Service account z-19-444c6e2-32-3021-admin@kubeflow-ci.iam.gserviceaccount.com doesn't exist or you do not have permission to access service accoun
ts.
+ deleteSa z-19-444c6e2-32-3021-user@kubeflow-ci.iam.gserviceaccount.com
+ local SA=z-19-444c6e2-32-3021-user@kubeflow-ci.iam.gserviceaccount.com
++ gcloud --project=kubeflow-ci iam service-accounts describe z-19-444c6e2-32-3021-user@kubeflow-ci.iam.gserviceaccount.com
Service account z-19-444c6e2-32-3021-user@kubeflow-ci.iam.gserviceaccount.com doesn't exist or you do not have permission to access service account
s.
+ O='ERROR: (gcloud.iam.service-accounts.describe) PERMISSION_DENIED: Permission iam.serviceAccounts.get is required to perform this operation on s
ervice account projects/-/serviceAccounts/z-19-444c6e2-32-3021-user@kubeflow-ci.iam.gserviceaccount.com.'
+ local RESULT=1

everpeace · 2018-08-16T15:18:47Z

@jlewi I fixed my branch regarding your comments. Please take a look again. Thanks in advance 🙇

everpeace

Reviewable status: 0 of 20 files reviewed, 7 unresolved discussions (waiting on @jlewi, @disktnk, and @gaocegege)

prow_config.yaml, line 6 at r3 (raw file):

Previously, jlewi (Jeremy Lewi) wrote…

chainer--e2e

This will beused as the prefix of the argo workflow name so its better to pick something more descriptive.

Done. however 'chainer-e2e' exceeds kubernetes label key length limit. So, I changed it to just 'chainer'.

test/workflows/components/workflows.jsonnet, line 7 at r3 (raw file):

Previously, jlewi (Jeremy Lewi) wrote…

Delete the todo.

Done.

test/workflows/components/workflows.jsonnet, line 13 at r3 (raw file):

Previously, jlewi (Jeremy Lewi) wrote…

nit: I would suggest moving the logic inside workflows.libsonnet into workflows.jsonnet and renaming it to something more descriptive e.g.
e2e_test.jsonnet.

Done.

jlewi · 2018-08-17T04:39:54Z

/lgtm
/approve

everpeace · 2018-08-17T04:45:00Z

@jlewi Thanks! I'm going to approve by myself this time. I'll add you to OWNERS file later 👍

everpeace · 2018-08-17T04:45:14Z

/approve

k8s-ci-robot · 2018-08-17T04:45:19Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: everpeace, jlewi

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [everpeace]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

everpeace

Reviewable status: 0 of 20 files reviewed, all discussions resolved (waiting on @jlewi, @disktnk, and @gaocegege)

everpeace

Reviewed 11 of 18 files at r1, 6 of 7 files at r3, 3 of 4 files at r4.
Reviewable status: complete! all files reviewed, all discussions resolved (waiting on @disktnk and @gaocegege)

k8s-ci-robot requested review from disktnk and gaocegege August 7, 2018 03:23

k8s-ci-robot added the size/XXL label Aug 7, 2018

k8s-ci-robot requested a review from jlewi August 7, 2018 03:49

jlewi suggested changes Aug 7, 2018

View reviewed changes

everpeace commented Aug 8, 2018

View reviewed changes

jlewi suggested changes Aug 8, 2018

View reviewed changes

everpeace commented Aug 9, 2018

View reviewed changes

everpeace mentioned this pull request Aug 9, 2018

add kubeflow/chainer-operator to config/jobs/kubeflow/kubeflow-[pre|post]submits.yaml kubernetes/test-infra#8997

Merged

everpeace force-pushed the e2e-test branch 6 times, most recently from 9a03d7b to 807524b Compare August 14, 2018 08:46

everpeace force-pushed the e2e-test branch from 2f61d7e to 3580caf Compare August 14, 2018 23:58

add e2e test scripts in prow

444c6e2

everpeace force-pushed the e2e-test branch from c9ae639 to 444c6e2 Compare August 15, 2018 02:39

workflow name should be more descriptive

eecf46c

everpeace force-pushed the e2e-test branch from 40eca0d to eecf46c Compare August 16, 2018 10:11

workflow name should be shorter.

400328e

everpeace force-pushed the e2e-test branch 13 times, most recently from 261e390 to 1a7115a Compare August 16, 2018 14:47

change compoent name to e2e and eliminate workflows libsonnet

9de47ab

everpeace force-pushed the e2e-test branch from 1a7115a to 9de47ab Compare August 16, 2018 14:49

everpeace commented Aug 16, 2018

View reviewed changes

k8s-ci-robot assigned jlewi Aug 17, 2018

k8s-ci-robot added the lgtm label Aug 17, 2018

k8s-ci-robot added the approved label Aug 17, 2018

everpeace commented Aug 17, 2018

View reviewed changes

k8s-ci-robot merged commit 26e3994 into kubeflow:master Aug 17, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add e2e test scripts in prow #19

add e2e test scripts in prow #19

everpeace commented Aug 7, 2018 •

edited

Loading

everpeace commented Aug 7, 2018

jlewi left a comment

jlewi commented Aug 7, 2018

everpeace left a comment

jlewi commented Aug 8, 2018

everpeace commented Aug 8, 2018

jlewi left a comment

jlewi commented Aug 8, 2018

everpeace left a comment

everpeace left a comment

everpeace commented Aug 9, 2018

jlewi commented Aug 9, 2018

everpeace commented Aug 13, 2018

jlewi commented Aug 13, 2018

everpeace commented Aug 14, 2018 •

edited

Loading

everpeace commented Aug 15, 2018

everpeace commented Aug 16, 2018 •

edited

Loading

everpeace left a comment

jlewi commented Aug 17, 2018

everpeace commented Aug 17, 2018

everpeace commented Aug 17, 2018

k8s-ci-robot commented Aug 17, 2018

everpeace left a comment

everpeace left a comment

add e2e test scripts in prow #19

add e2e test scripts in prow #19

Conversation

everpeace commented Aug 7, 2018 • edited Loading

everpeace commented Aug 7, 2018

jlewi left a comment

Choose a reason for hiding this comment

jlewi commented Aug 7, 2018

everpeace left a comment

Choose a reason for hiding this comment

jlewi commented Aug 8, 2018

everpeace commented Aug 8, 2018

jlewi left a comment

Choose a reason for hiding this comment

jlewi commented Aug 8, 2018

everpeace left a comment

Choose a reason for hiding this comment

everpeace left a comment

Choose a reason for hiding this comment

everpeace commented Aug 9, 2018

jlewi commented Aug 9, 2018

everpeace commented Aug 13, 2018

jlewi commented Aug 13, 2018

everpeace commented Aug 14, 2018 • edited Loading

everpeace commented Aug 15, 2018

everpeace commented Aug 16, 2018 • edited Loading

everpeace left a comment

Choose a reason for hiding this comment

jlewi commented Aug 17, 2018

everpeace commented Aug 17, 2018

everpeace commented Aug 17, 2018

k8s-ci-robot commented Aug 17, 2018

everpeace left a comment

Choose a reason for hiding this comment

everpeace left a comment

Choose a reason for hiding this comment

everpeace commented Aug 7, 2018 •

edited

Loading

everpeace commented Aug 14, 2018 •

edited

Loading

everpeace commented Aug 16, 2018 •

edited

Loading