Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't create LoadBalancer on medium->large GCE clusters #27731

Closed
zmerlynn opened this issue Jun 20, 2016 · 1 comment · Fixed by #27741
Closed

Can't create LoadBalancer on medium->large GCE clusters #27731

zmerlynn opened this issue Jun 20, 2016 · 1 comment · Fixed by #27741
Assignees
Labels
area/controller-manager priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now.
Milestone

Comments

@zmerlynn
Copy link
Member

Customer impact: Attempts to create services with type LoadBalancer will never get an external IP or any GCE object created on all clusters >=124 nodes (with variably cutoffs starting even at 72 nodes, depending on the cluster name). I believe, but haven't tested, that L7LB will fail to create firewalls on these size clusters as well (based on code inspection).

Analysis: There's an issue in

func (gce *GCECloud) getInstancesByNames(names []string) ([]*gceInstance, error) {
where, depending on the name of the cluster, if you attempt to call the function with x > ~100 nodes (roughly, it depends on the cluster name), the GCE API will reject you because the Filter() parameter is too large.

At some larger cluster sizes, the API endpoint just outright rejects the GET request completely, versus clearly indicating the Filter() is wrong.

This is a problem in all 1.2.x and forward.

I have a PR incoming for base k8s, then we'll probably need to rev L7LB as well.

@zmerlynn zmerlynn added priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. area/controller-manager cherrypick-candidate labels Jun 20, 2016
@zmerlynn zmerlynn added this to the v1.2 milestone Jun 20, 2016
@zmerlynn
Copy link
Member Author

cc @kubernetes/sig-network @cjcullen

@zmerlynn zmerlynn self-assigned this Jun 20, 2016
k8s-github-robot pushed a commit that referenced this issue Jun 21, 2016
Automatic merge from submit-queue

GCE provider: Limit Filter calls to regexps rather than insane blobs

Filters can't exceed 4k, and GET requests against the GCE API are also limited, so these break down in different ways at different cluster counts. Fix it by introducing an advisory `node-instance-prefix` configuration in the GCE provider that can hint the `EnsureLoadBalancer`/`UpdateLoadBalancer code` (and the firewall creation/update code). If it's not there, or wrong (a hostname that's registered violates it), just ignore it and grab the whole project.

Fixes #27731 
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/.github/PULL_REQUEST_TEMPLATE.md?pixel)]()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/controller-manager priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant