-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Start deploying the bootstrapper via deployment manager. #823
Conversation
Test failed because of an IAM permission issue
The cloudservices account in the test project needs to have IAM set policy permissions. I added |
# 2. Create two separate deployments and launch the boot strapper | ||
# after the cluster is created. | ||
# | ||
# Two separate deployments doesn't make much sense; we could just use |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having initialNodeCount
and cpu-pool-initialNodeCount
can be confusing. Can initialNodeCount
property be removed from here and set to 1 by default in cluster.jinja?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
docs/gke/configs/cluster.jinja
Outdated
{% set K8S_ENDPOINTS = {'': 'api/v1', '-v1beta1-extensions': 'apis/extensions/v1beta1'} %} | ||
{% set RBAC_TYPE_NAME = TYPE_NAME + '-rbac-v1' %} | ||
|
||
{% set K8S_ENDPOINTS = {'': 'api/v1', '-v1beta1-extensions': 'apis/extensions/v1beta1', '-rbac-v1': 'apis/rbac.authorization.k8s.io/v1'} %} | ||
{% set CPU_POOL = 'cpu-pool-' + properties['pool-version'] %} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use {{ CPU_POOL }}
on line 68 instead of duplicating this
docs/gke/configs/cluster.jinja
Outdated
# Wait for the type provider to be created. | ||
- {{ TYPE_NAME }} | ||
|
||
{# StatefulSet needs a service. #} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is correct.
I was able to create a statefulset without a service
kubectl apply -f- <<EOF
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: website
spec:
serviceName: "nginx"
replicas: 1
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: k8s.gcr.io/nginx-slim:0.8
ports:
- containerPort: 80
name: web
EOF
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's what I thought. Thanks for trying it out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks
/test all |
/lgtm /hold Feel free to run |
Test failed with an internal error trying to tear it down.
|
/test all |
@kunmingg @ankushagarwal Can you PTAL? I had to fix a bug causing the tests to fail. So I also updated it to actually run the bootstrapper. |
/lgtm |
/hold cancel |
/test all |
/lgtm |
/hold |
* This config creates the K8s resources needed to run the bootstrapper * Enable the ResourceManager API; this is used to get IAM policies * Add IAM roles to the cloudservices account. This is needed so that the deployment manager has sufficient RBAC permissions to do what it needs to. * Delete initialNodeCount and just make the default node pool a 1 CPU node pool. * The bootstrapper isn't running successfully; it looks like its trying to create a pytorch component but its using an older version of the registry which doesn't include the pytorch operator. * Set delete policy on K8s resources to ABANDON otherwise we get internal errors. * We can use actions to enable APIs and then we won't try to delete the API when the deployment is deleted which causes errors. fix kubeflow#833
/test all |
Another internal error deleting the deployment
|
/test all |
@ankushagarwal Could you take another look I had to update the tests again and I switched to using actions to enable the APIs so we wouldn't need a separate deployment. |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: ankushagarwal The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/hold cancel |
/test all |
* This config creates the K8s resources needed to run the bootstrapper * Enable the ResourceManager API; this is used to get IAM policies * Add IAM roles to the cloudservices account. This is needed so that the deployment manager has sufficient RBAC permissions to do what it needs to. * Delete initialNodeCount and just make the default node pool a 1 CPU node pool. * The bootstrapper isn't running successfully; it looks like its trying to create a pytorch component but its using an older version of the registry which doesn't include the pytorch operator. * Set delete policy on K8s resources to ABANDON otherwise we get internal errors. * We can use actions to enable APIs and then we won't try to delete the API when the deployment is deleted which causes errors. fix kubeflow#833
…ubeflow#823) * image gcr.io/kubeflow-images-public/jupyter-web-app:vmaster-gc80316d3 * Image built from kubeflow/kubeflow@c80316d3
This config creates the K8s resources needed to run the bootstrapper
Enable the ResourceManager API; this is used to get IAM policies
Add IAM roles to the cloudservices account. This is needed so that
the deployment manager has sufficient RBAC permissions to do what it needs
to.
Delete initialNodeCount and just make the default node pool a 1 CPU node pool.
The bootstrapper isn't running successfully; it looks like its trying
to create a pytorch component but its using an older version of the registry
which doesn't include the pytorch operator.
Related to: #802 Use deployment manager to run bootstrapper
Related to #802 On GCP trigger bootstrapper with deployment manager
Related to #757 Install Kubeflow via deployment manager
Related to #833 Internal errors
/assign @kunmingg
This change is