Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GCE: support Cloud TPU API in cloud provider #58029

Merged
merged 2 commits into from
Feb 27, 2018

Conversation

yguo0905
Copy link
Contributor

What this PR does / why we need it:

This PR adds the support for Cloud TPU API in GCE cloud provider.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #

Special notes for your reviewer:

Release note:

GCE: support Cloud TPU API in cloud provider

/assign @vishh
/assign @cheftako

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Jan 10, 2018
Copy link
Member

@cheftako cheftako left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that the new k8s API's are not implementing cloud provider but do end up calling objects with Contexts, can we have them have a Context object as their first parameter? Also can you file an issue on the tpu package for not adhering to best practices on Context in go? ("Do not store Contexts inside a struct type; instead, pass a Context explicitly to each function that needs it. The Context should be the first parameter, typically named ctx:" from https://golang.org/pkg/context/)

defer mc.Observe(err)

var op *tpuapi.Operation
name = fmt.Sprintf("projects/%s/locations/%s/nodes/%s", gce.projectID, zone, name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we abstract fmt.Sprintf("projects/%s/locations/%s/nodes/%s" and fmt.Sprintf("projects/%s/locations/%s" into helper methods? That way we don't have to worry about things like typos.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

// occurs.
func (gce *GCECloud) waitForTPUOp(interval, timeout time.Duration, op *tpuapi.Operation) (*tpuapi.Operation, error) {
if err := wait.PollImmediate(interval, timeout, func() (bool, error) {
glog.V(2).Infof("Waiting for operation %q to complete...", op.Name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like it might be a bit spammy for log level 2

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

return true, err
}
if op.Done {
glog.V(2).Infof("Operation %q has completed", op.Name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again with a bit spammy for 2

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@yguo0905
Copy link
Contributor Author

Given that the new k8s API's are not implementing cloud provider but do end up calling objects with Contexts, can we have them have a Context object as their first parameter? Also can you file an issue on the tpu package for not adhering to best practices on Context in go? ("Do not store Contexts inside a struct type; instead, pass a Context explicitly to each function that needs it. The Context should be the first parameter, typically named ctx:" from https://golang.org/pkg/context/)

The APIs (including TPU APIs) in https://github.com/google/google-api-go-client are generated in a legacy way - they have the context issue as you described. The APIs with the new style are in https://github.com/GoogleCloudPlatform/google-cloud-go but they do not yet include TPU APIs. I think when the API owners migrate them to the new style, the context issue should be automatically fixed.

@yguo0905
Copy link
Contributor Author

/test pull-kubernetes-e2e-kops-aws

1 similar comment
@yguo0905
Copy link
Contributor Author

/test pull-kubernetes-e2e-kops-aws

@yguo0905
Copy link
Contributor Author

@vishh, please take a look at this PR.

// newTPUMetricContext returns a new metricContext used for recording metrics
// of Cloud TPU API calls.
func newTPUMetricContext(request, zone string) *metricContext {
return newGenericMetricContext("tpus", request, unusedMetricLabel, zone, computeAlphaVersion)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's computeAlphaVersion? Are we using alpha APIs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a string constant alpha. This is just a label passed into the metrics. Since we are using alpha TPU API, I set this to "alpha" explicitly.

@@ -3043,6 +3043,10 @@
"ImportPath": "google.golang.org/api/pubsub/v1",
"Rev": "c0dae069ee96c9261a04c81efd9e0f1e55f565ac"
},
{
"ImportPath": "google.golang.org/api/tpu/v1alpha1",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's switch to v1beta1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have generated clients for v1beta1 at this point. I asked the TPU team to generate and publish them on https://github.com/google/google-api-go-client.

@vishh
Copy link
Contributor

vishh commented Feb 14, 2018

Switch to v1beta1 and then this PR is good to go.

@yguo0905
Copy link
Contributor Author

/retest

@vishh
Copy link
Contributor

vishh commented Feb 27, 2018

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 27, 2018
@vishh vishh added this to the v1.10 milestone Feb 27, 2018
@vishh vishh added status/approved-for-milestone priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. area/hw-accelerators labels Feb 27, 2018
@yguo0905
Copy link
Contributor Author

/assign @thockin

@ixdy
Copy link
Member

ixdy commented Feb 27, 2018

/retest

@thockin
Copy link
Member

thockin commented Feb 27, 2018

/approve
/approve no-issue

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 27, 2018
@k8s-github-robot k8s-github-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 27, 2018
@vishh
Copy link
Contributor

vishh commented Feb 27, 2018

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 27, 2018
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: thockin, vishh, yguo0905

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-github-robot
Copy link

/test all [submit-queue is verifying that this PR is safe to merge]

@k8s-github-robot
Copy link

Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions here.

@k8s-github-robot k8s-github-robot merged commit e30554b into kubernetes:master Feb 27, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/hw-accelerators cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants