-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create a python script to deploy Kubeflow on GCP via deployment manager. #866
Conversation
/retest |
testing/deploy_kubeflow_gcp.py
Outdated
test_suite.run() | ||
|
||
if __name__ == "__main__": | ||
logging.basicConfig( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
test_helper.init() takes care of logging initialization
https://github.com/kubeflow/testing/blob/master/py/kubeflow/testing/test_helper.py#L159
@@ -0,0 +1,4 @@ | |||
INFO|2018-05-23T18:14:50|deploy_kubeflow_gcp.py:118| Creating deployment jlewi-kubeflow-test3 in project cloud-ml-dev |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Checked in by accident?
Most recent failure was a flake in gke_e2e trying to contact github.com |
Test flake looks unrelated
So I think this is ready for review. |
/assign @kunmingg |
/retest |
More random test failures
|
/retest |
…nager. * The scripts replaces our bash commands * For teardown we want to add retries to better handle INTERNAL_ERRORS with deployment manager that are causing the test to be flaky. Related to kubeflow#836 verify Kubeflow deployed correctly with deployment manager. * Fix resource_not_found errors in delete (kubeflow#833) * The not found error was due to the type providers for K8s resources being deleted before the corresponding K8s resources. So the subsequent delete of the K8s resource would fail because the type provider did not exist. * We fix this by using a $ref to refer to the type provider in the type field of K8s resources.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/retest |
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: jlewi The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/retest |
Timed out waiting for TfJob ; everything else passed. The workflow logs indicate the TFJob started and entered the running state pretty quickly. But then it appears to have gotten stuck. I looked at the Kubernetes events related to this name space and it looks like most of the pods Master starts and is stuck waiting for
But the pod logs for that pod indicated it started just fine. |
/retest |
/retest |
2 similar comments
/retest |
/retest |
/test all |
/lgtm |
…er. (kubeflow#866) * Create python scripts for deploying Kubeflow on GCP via deployment manager. * The scripts replaces our bash commands * For teardown we want to add retries to better handle INTERNAL_ERRORS with deployment manager that are causing the test to be flaky. Related to kubeflow#836 verify Kubeflow deployed correctly with deployment manager. * Fix resource_not_found errors in delete (kubeflow#833) * The not found error was due to the type providers for K8s resources being deleted before the corresponding K8s resources. So the subsequent delete of the K8s resource would fail because the type provider did not exist. * We fix this by using a $ref to refer to the type provider in the type field of K8s resources. * * deletePolicy can't be set per resource * Autoformat jsonnet.
with deployment manager that are causing the test to be flaky.
Related to #836 verify Kubeflow deployed correctly with deployment manager.
Fix resource_not_found errors in delete (GCP deployment manager test handle internal errors #833)
The not found error was due to the type providers for K8s resources
being deleted before the corresponding K8s resources. So the subsequent
delete of the K8s resource would fail because the type provider did not
exist.
We fix this by using a $ref to refer to the type provider in the type field
of K8s resources.
This change is