Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not enough ips in kubernetes-jenkins-pull #25629

Closed
bprashanth opened this issue May 15, 2016 · 12 comments
Closed

Not enough ips in kubernetes-jenkins-pull #25629

bprashanth opened this issue May 15, 2016 · 12 comments
Assignees
Labels
area/test-infra kind/flake Categorizes issue or PR as related to a flaky test.

Comments

@bprashanth
Copy link
Contributor

bprashanth commented May 15, 2016

Master didn't come up because there were no ips:
https://pantheon.corp.google.com/storage/browser/kubernetes-jenkins/pr-logs/pull/23567/kubernetes-pull-build-test-e2e-gce/40061/artifacts/

Created [https://www.googleapis.com/compute/v1/projects/kubernetes-jenkins-pull/regions/us-central1/addresses/e2e-gce-builder-2-0-master-ip].
Generating certs for alternate-names: IP:146.148.108.105,IP:10.0.0.1,DNS:kubernetes,DNS:kubernetes.default,DNS:kubernetes.default.svc,DNS:kubernetes.default.svc.cluster.local,DNS:e2e-gce-builder-2-0-master
+++ Logging using Fluentd to elasticsearch
ERROR: (gcloud.compute.instances.create) Some requests did not succeed:
 - Quota 'IN_USE_ADDRESSES' exceeded.  Limit: 120.0

we're at:

In-use IP addresses us-central1 
97%
117 of 120

so we're either running too much on that project or leaking ips.

@kubernetes/goog-testing

@bprashanth bprashanth added area/test-infra kind/flake Categorizes issue or PR as related to a flaky test. labels May 15, 2016
@ixdy
Copy link
Member

ixdy commented May 15, 2016

Now we have a different problem. Someone or something deleted all of the Jenkins GCE VMs in that project.

@bprashanth
Copy link
Contributor Author

Sorry, I'm retarded. I did what i always do when there's a leak, free resources. Thought i was deleting disposable vms but i clearly messed that up.

@ixdy
Copy link
Member

ixdy commented May 15, 2016

well, I will take this opportunity to fully clean up the project.

The node e2e tests occasionally leak VMs. We need to improve our infrastructure to garbage-collect these.

@bprashanth
Copy link
Contributor Author

Things should be running now, closing. More than 60% of the ips are unused.

@bprashanth
Copy link
Contributor Author

So it looks like we go from 0-100, real quick. We're back at 90 something % usage of ips.
I think we're just naturally saturating and need a quota bump, and till that gets approved some runs might flake. Leaving this open in the interim so people don't keep opening issues for the same thing.

@ixdy
Copy link
Member

ixdy commented May 16, 2016

Cleaned up leaked node e2e VMs:

gcloud compute instances list --project=kubernetes-jenkins-pull --filter="NOT tags.items:do-not-delete AND creationTimestamp<'$(date +%Y-%m-%dT%H:%M:%S%z --date='1 hour ago')' AND NOT name~'.*jenkins.*'" --format='value(name)' | xargs gcloud compute instances delete --project=kubernetes-jenkins-pull --zone=us-central1-f

@bprashanth
Copy link
Contributor Author

Hmm, looks like we're maxing out 150 ips now (#25171 (comment))

In-use IP addresses us-central1 
99%
148 of 150

Once bitten twice shy, @ixdy kick off your cleanup script? (or gc in a cron job till we fix the leak?)

@ixdy
Copy link
Member

ixdy commented May 16, 2016

Yeah, cleaning up now. I'll make it part of our daily cleanup, but we need to figure out why so many of the node e2e tests are getting stuck, which is the root cause. @pwittrock

@spiffxp
Copy link
Member

spiffxp commented May 17, 2016

/cc @kubernetes/sig-testing

@ixdy ixdy self-assigned this May 17, 2016
@ixdy
Copy link
Member

ixdy commented May 17, 2016

Root cause (leaking VMs) fixed. Should be in the clear now.

@ixdy ixdy closed this as completed May 17, 2016
@bprashanth
Copy link
Contributor Author

@ixdy it occurred to me that an alert on quota > 75% indicates imminent flake. I'd subscribe to that if we had it.

@ixdy
Copy link
Member

ixdy commented May 17, 2016

I'm working on setting up monitoring and alerting of our GCP quotas. It's unnecessarily difficult to integrate, unfortunately.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/test-infra kind/flake Categorizes issue or PR as related to a flaky test.
Projects
None yet
Development

No branches or pull requests

3 participants