-
Notifications
You must be signed in to change notification settings - Fork 40.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GCE: support Cloud TPU API in cloud provider #58029
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that the new k8s API's are not implementing cloud provider but do end up calling objects with Contexts, can we have them have a Context object as their first parameter? Also can you file an issue on the tpu package for not adhering to best practices on Context in go? ("Do not store Contexts inside a struct type; instead, pass a Context explicitly to each function that needs it. The Context should be the first parameter, typically named ctx:" from https://golang.org/pkg/context/)
defer mc.Observe(err) | ||
|
||
var op *tpuapi.Operation | ||
name = fmt.Sprintf("projects/%s/locations/%s/nodes/%s", gce.projectID, zone, name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we abstract fmt.Sprintf("projects/%s/locations/%s/nodes/%s" and fmt.Sprintf("projects/%s/locations/%s" into helper methods? That way we don't have to worry about things like typos.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
// occurs. | ||
func (gce *GCECloud) waitForTPUOp(interval, timeout time.Duration, op *tpuapi.Operation) (*tpuapi.Operation, error) { | ||
if err := wait.PollImmediate(interval, timeout, func() (bool, error) { | ||
glog.V(2).Infof("Waiting for operation %q to complete...", op.Name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like it might be a bit spammy for log level 2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
return true, err | ||
} | ||
if op.Done { | ||
glog.V(2).Infof("Operation %q has completed", op.Name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again with a bit spammy for 2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
The APIs (including TPU APIs) in https://github.com/google/google-api-go-client are generated in a legacy way - they have the context issue as you described. The APIs with the new style are in https://github.com/GoogleCloudPlatform/google-cloud-go but they do not yet include TPU APIs. I think when the API owners migrate them to the new style, the context issue should be automatically fixed. |
/test pull-kubernetes-e2e-kops-aws |
1 similar comment
/test pull-kubernetes-e2e-kops-aws |
@vishh, please take a look at this PR. |
// newTPUMetricContext returns a new metricContext used for recording metrics | ||
// of Cloud TPU API calls. | ||
func newTPUMetricContext(request, zone string) *metricContext { | ||
return newGenericMetricContext("tpus", request, unusedMetricLabel, zone, computeAlphaVersion) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's computeAlphaVersion? Are we using alpha APIs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a string constant alpha
. This is just a label passed into the metrics. Since we are using alpha TPU API, I set this to "alpha"
explicitly.
Godeps/Godeps.json
Outdated
@@ -3043,6 +3043,10 @@ | |||
"ImportPath": "google.golang.org/api/pubsub/v1", | |||
"Rev": "c0dae069ee96c9261a04c81efd9e0f1e55f565ac" | |||
}, | |||
{ | |||
"ImportPath": "google.golang.org/api/tpu/v1alpha1", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's switch to v1beta1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't have generated clients for v1beta1 at this point. I asked the TPU team to generate and publish them on https://github.com/google/google-api-go-client.
Switch to |
/retest |
/lgtm |
/assign @thockin |
/retest |
/approve |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: thockin, vishh, yguo0905 The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/test all [submit-queue is verifying that this PR is safe to merge] |
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions here. |
What this PR does / why we need it:
This PR adds the support for Cloud TPU API in GCE cloud provider.
Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close the issue(s) when PR gets merged):Fixes #
Special notes for your reviewer:
Release note:
/assign @vishh
/assign @cheftako