Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GCE provider: Limit Filter calls to regexps rather than large blobs #27741

Merged
merged 1 commit into from
Jun 21, 2016

Conversation

zmerlynn
Copy link
Member

@zmerlynn zmerlynn commented Jun 21, 2016

Filters can't exceed 4k, and GET requests against the GCE API are also limited, so these break down in different ways at different cluster counts. Fix it by introducing an advisory node-instance-prefix configuration in the GCE provider that can hint the EnsureLoadBalancer/UpdateLoadBalancer code (and the firewall creation/update code). If it's not there, or wrong (a hostname that's registered violates it), just ignore it and grab the whole project.

Fixes #27731
Analytics

@zmerlynn zmerlynn added priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. sig/network Categorizes an issue or PR as relevant to SIG Network. area/controller-manager cherrypick-candidate labels Jun 21, 2016
@zmerlynn zmerlynn added this to the v1.3 milestone Jun 21, 2016
@k8s-github-robot k8s-github-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. release-note-label-needed labels Jun 21, 2016
@zmerlynn zmerlynn added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed release-note-label-needed labels Jun 21, 2016
@zmerlynn
Copy link
Member Author

cc @roberthbailey: This needs a v1.2 cherrypick as well, once we get it in master.

@zmerlynn
Copy link
Member Author

After testing this on a 1000 node cluster, I found another nasty surprise lurking in the same APIs, so I pushed a new iteration that killed the Field() limiters after filing an internal bug.

@@ -2467,6 +2492,13 @@ func (gce *GCECloud) getInstancesByNames(names []string) ([]*gceInstance, error)
pageToken = res.NextPageToken
for _, i := range res.Items {
name := i.Name
if inst, ok := instances[name]; !ok || inst != nil {
if inst != nil {
glog.Warningf("GCE returned duplicate result for instance %q", name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this really happen? It seems to me a bit paranoid.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added it when was debugging that internal bug. I was thinking there was instability across pages (I.e. with the wrong options it returned a,b,c then c,d,f) in some race condition, which would then show up as a missing instance a few lines below (so the idea is that the pair of lines would show the instability). That ended up not being the case, but it doesn't seem harmful to leave in.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went ahead and took it out. If the other case ever comes up, hopefully it'll be obvious (or non-harmful, e.g. duplicate results but all the results).

@wojtek-t
Copy link
Member

I added a minor comment. Other than that LGTM.
Feel free to self-apply label.

@wojtek-t wojtek-t self-assigned this Jun 21, 2016
Filters can't exceed 4k, and GET requests against the GCE API are also
limited, so these break down in different ways at different cluster
counts. Fix it by introducing an advisory node-instance-prefix
configuration in the GCE provider that can hint the
EnsureLoadBalancer/UpdateLoadBalancer code (and the firewall
creation/update code). If it's not there, or wrong (a hostname that's
registered violates it), just ignore it and grab the whole project.
@zmerlynn zmerlynn added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 21, 2016
@k8s-bot
Copy link

k8s-bot commented Jun 21, 2016

GCE e2e build/test passed for commit dd4dae4.

@k8s-github-robot
Copy link

@k8s-bot test this [submit-queue is verifying that this PR is safe to merge]

if cfg.Global.Multizone {
managedZones = nil // Use all zones in region
}
}

return CreateGCECloud(projectID, region, zone, managedZones, networkURL, nodeTags, tokenSource, true /* useMetadataServer */)
return CreateGCECloud(projectID, region, zone, managedZones, networkURL, nodeTags, nodeInstancePrefix, tokenSource, true /* useMetadataServer */)
}

// Creates a GCECloud object using the specified parameters.
// If no networkUrl is specified, loads networkName via rest call.
// If no tokenSource is specified, uses oauth2.DefaultTokenSource.
// If managedZones is nil / empty all zones in the region will be managed.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a comment describing the exact use of nodeInstancePrefix and impact of passing in an empty string?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't see this before it merged. To clarify it here (and I'll have a PR later): It's an advisory prefix to help GCE searches for nodes in the cluster. If nodes happen to join that violate the nodeInstancePrefix, the nodeInstancePrefix is ignored and the entire GCE project is enumerated.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, and if it's empty, we enumerate the whole project.

@k8s-bot
Copy link

k8s-bot commented Jun 21, 2016

GCE e2e build/test passed for commit dd4dae4.

@k8s-github-robot
Copy link

Automatic merge from submit-queue

@k8s-github-robot k8s-github-robot merged commit 6cccb87 into kubernetes:master Jun 21, 2016
@zmerlynn zmerlynn deleted the fix-gce-filters branch June 21, 2016 16:56
@zmerlynn zmerlynn modified the milestones: v1.2, v1.3 Jun 21, 2016
@roberthbailey roberthbailey added the cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. label Jun 21, 2016
roberthbailey added a commit that referenced this pull request Jun 21, 2016
…41-upstream-release-1.2

Automated cherry pick of #27741
zmerlynn added a commit to zmerlynn/contrib that referenced this pull request Jun 22, 2016
Fix to kubernetes/kubernetes#27821. Borrows
the necessary piece of
kubernetes/kubernetes#27741 to parse the
gce.conf.

Also opened kubernetes-retired#1251 to fix Godeps correctly
@roberthbailey roberthbailey changed the title GCE provider: Limit Filter calls to regexps rather than insane blobs GCE provider: Limit Filter calls to regexps rather than large blobs Jun 24, 2016
shyamjvs pushed a commit to shyamjvs/kubernetes that referenced this pull request Dec 1, 2016
…ck-of-#27741-upstream-release-1.2

Automated cherry pick of kubernetes#27741
shouhong pushed a commit to shouhong/kubernetes that referenced this pull request Feb 14, 2017
…ck-of-#27741-upstream-release-1.2

Automated cherry pick of kubernetes#27741
mwielgus pushed a commit to kubernetes/autoscaler that referenced this pull request Apr 14, 2017
Fix to kubernetes/kubernetes#27821. Borrows
the necessary piece of
kubernetes/kubernetes#27741 to parse the
gce.conf.

Also opened kubernetes-retired/contrib#1251 to fix Godeps correctly
mwielgus pushed a commit to kubernetes/autoscaler that referenced this pull request Apr 18, 2017
Fix to kubernetes/kubernetes#27821. Borrows
the necessary piece of
kubernetes/kubernetes#27741 to parse the
gce.conf.

Also opened kubernetes-retired/contrib#1251 to fix Godeps correctly
mwielgus pushed a commit to kubernetes/autoscaler that referenced this pull request Apr 18, 2017
Fix to kubernetes/kubernetes#27821. Borrows
the necessary piece of
kubernetes/kubernetes#27741 to parse the
gce.conf.

Also opened kubernetes-retired/contrib#1251 to fix Godeps correctly
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/controller-manager cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/network Categorizes an issue or PR as relevant to SIG Network. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Can't create LoadBalancer on medium->large GCE clusters
8 participants