Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quota 'SUBNETWORKS' exceeded in e2e tests #46713

Closed
crassirostris opened this issue May 31, 2017 · 27 comments
Closed

Quota 'SUBNETWORKS' exceeded in e2e tests #46713

crassirostris opened this issue May 31, 2017 · 27 comments
Assignees
Labels
kind/failing-test Categorizes issue or PR as related to a consistently or frequently failing test. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. sig/network Categorizes an issue or PR as relevant to SIG Network.

Comments

@crassirostris
Copy link

E.g.

https://k8s-gubernator.appspot.com/build/kubernetes-jenkins/pr-logs/pull/46700/pull-kubernetes-e2e-gce-etcd3/33214/
https://k8s-gubernator.appspot.com/build/kubernetes-jenkins/pr-logs/pull/46700/pull-kubernetes-kubemark-e2e-gce/32814/
https://k8s-gubernator.appspot.com/build/kubernetes-jenkins/pr-logs/pull/46696/pull-kubernetes-kubemark-e2e-gce/32813/

W0531 10:20:34.781] ERROR: (gcloud.compute.networks.create) Could not fetch resource:
W0531 10:20:34.782]  - Quota 'SUBNETWORKS' exceeded.  Limit: 100.0
@crassirostris
Copy link
Author

/cc @fejta Could you please assign it to an appropriate person?

@crassirostris crassirostris added the sig/testing Categorizes an issue or PR as relevant to SIG Testing. label May 31, 2017
@krzyzacy
Copy link
Member

/assign

@krzyzacy
Copy link
Member

the kubemark project looks normal, etcd project is flooded though, does etcd suite using excessive resources?

@fejta
Copy link
Contributor

fejta commented May 31, 2017

/remove-sig testing
/unassign @krzyzacy
/assign @bowei
@kubernetes/sig-network-test-failures

Looks like we are leaking SUBNETWORKS, see http://prow.k8s.io/?type=presubmit&job=pull-kubernetes-e2e-gce-etcd3

@k8s-ci-robot k8s-ci-robot added sig/network Categorizes an issue or PR as relevant to SIG Network. and removed sig/testing Categorizes an issue or PR as relevant to SIG Testing. labels May 31, 2017
@k8s-ci-robot k8s-ci-robot assigned bowei and unassigned krzyzacy May 31, 2017
@krzyzacy
Copy link
Member

I don't think we are leaking - each network create 8 subnets + there's 8 default one, and we are running 12 instances which total subnets will be 13 * 8 = 104 > 100, the 12th job will always fail..

probably just need to bump the quota a little bit more.

gcloud compute networks subnets list --project=k8s-jkns-pr-gce-etcd3
NAME                    REGION           NETWORK                 RANGE
default                 asia-northeast1  default                 10.146.0.0/20
e2e-gce-agent-pr-102-0  asia-northeast1  e2e-gce-agent-pr-102-0  10.146.0.0/20
e2e-gce-agent-pr-105-0  asia-northeast1  e2e-gce-agent-pr-105-0  10.146.0.0/20
e2e-gce-agent-pr-15-0   asia-northeast1  e2e-gce-agent-pr-15-0   10.146.0.0/20
e2e-gce-agent-pr-2-0    asia-northeast1  e2e-gce-agent-pr-2-0    10.146.0.0/20
e2e-gce-agent-pr-21-0   asia-northeast1  e2e-gce-agent-pr-21-0   10.146.0.0/20
e2e-gce-agent-pr-34-0   asia-northeast1  e2e-gce-agent-pr-34-0   10.146.0.0/20
e2e-gce-agent-pr-36-0   asia-northeast1  e2e-gce-agent-pr-36-0   10.146.0.0/20
e2e-gce-agent-pr-47-0   asia-northeast1  e2e-gce-agent-pr-47-0   10.146.0.0/20
e2e-gce-agent-pr-49-0   asia-northeast1  e2e-gce-agent-pr-49-0   10.146.0.0/20
e2e-gce-agent-pr-57-0   asia-northeast1  e2e-gce-agent-pr-57-0   10.146.0.0/20
e2e-gce-agent-pr-98-0   asia-northeast1  e2e-gce-agent-pr-98-0   10.146.0.0/20
default                 us-west1         default                 10.138.0.0/20
e2e-gce-agent-pr-102-0  us-west1         e2e-gce-agent-pr-102-0  10.138.0.0/20
e2e-gce-agent-pr-105-0  us-west1         e2e-gce-agent-pr-105-0  10.138.0.0/20
e2e-gce-agent-pr-15-0   us-west1         e2e-gce-agent-pr-15-0   10.138.0.0/20
e2e-gce-agent-pr-2-0    us-west1         e2e-gce-agent-pr-2-0    10.138.0.0/20
e2e-gce-agent-pr-21-0   us-west1         e2e-gce-agent-pr-21-0   10.138.0.0/20
e2e-gce-agent-pr-34-0   us-west1         e2e-gce-agent-pr-34-0   10.138.0.0/20
e2e-gce-agent-pr-36-0   us-west1         e2e-gce-agent-pr-36-0   10.138.0.0/20
e2e-gce-agent-pr-47-0   us-west1         e2e-gce-agent-pr-47-0   10.138.0.0/20
e2e-gce-agent-pr-49-0   us-west1         e2e-gce-agent-pr-49-0   10.138.0.0/20
e2e-gce-agent-pr-57-0   us-west1         e2e-gce-agent-pr-57-0   10.138.0.0/20
e2e-gce-agent-pr-98-0   us-west1         e2e-gce-agent-pr-98-0   10.138.0.0/20
default                 asia-east1       default                 10.140.0.0/20
e2e-gce-agent-pr-102-0  asia-east1       e2e-gce-agent-pr-102-0  10.140.0.0/20
e2e-gce-agent-pr-105-0  asia-east1       e2e-gce-agent-pr-105-0  10.140.0.0/20
e2e-gce-agent-pr-15-0   asia-east1       e2e-gce-agent-pr-15-0   10.140.0.0/20
e2e-gce-agent-pr-2-0    asia-east1       e2e-gce-agent-pr-2-0    10.140.0.0/20
e2e-gce-agent-pr-21-0   asia-east1       e2e-gce-agent-pr-21-0   10.140.0.0/20
e2e-gce-agent-pr-34-0   asia-east1       e2e-gce-agent-pr-34-0   10.140.0.0/20
e2e-gce-agent-pr-36-0   asia-east1       e2e-gce-agent-pr-36-0   10.140.0.0/20
e2e-gce-agent-pr-47-0   asia-east1       e2e-gce-agent-pr-47-0   10.140.0.0/20
e2e-gce-agent-pr-49-0   asia-east1       e2e-gce-agent-pr-49-0   10.140.0.0/20
e2e-gce-agent-pr-57-0   asia-east1       e2e-gce-agent-pr-57-0   10.140.0.0/20
e2e-gce-agent-pr-98-0   asia-east1       e2e-gce-agent-pr-98-0   10.140.0.0/20
default                 asia-southeast1  default                 10.148.0.0/20
e2e-gce-agent-pr-102-0  asia-southeast1  e2e-gce-agent-pr-102-0  10.148.0.0/20
e2e-gce-agent-pr-105-0  asia-southeast1  e2e-gce-agent-pr-105-0  10.148.0.0/20
e2e-gce-agent-pr-15-0   asia-southeast1  e2e-gce-agent-pr-15-0   10.148.0.0/20
e2e-gce-agent-pr-2-0    asia-southeast1  e2e-gce-agent-pr-2-0    10.148.0.0/20
e2e-gce-agent-pr-21-0   asia-southeast1  e2e-gce-agent-pr-21-0   10.148.0.0/20
e2e-gce-agent-pr-34-0   asia-southeast1  e2e-gce-agent-pr-34-0   10.148.0.0/20
e2e-gce-agent-pr-36-0   asia-southeast1  e2e-gce-agent-pr-36-0   10.148.0.0/20
e2e-gce-agent-pr-47-0   asia-southeast1  e2e-gce-agent-pr-47-0   10.148.0.0/20
e2e-gce-agent-pr-49-0   asia-southeast1  e2e-gce-agent-pr-49-0   10.148.0.0/20
e2e-gce-agent-pr-57-0   asia-southeast1  e2e-gce-agent-pr-57-0   10.148.0.0/20
e2e-gce-agent-pr-98-0   asia-southeast1  e2e-gce-agent-pr-98-0   10.148.0.0/20
default                 us-east4         default                 10.150.0.0/20
e2e-gce-agent-pr-102-0  us-east4         e2e-gce-agent-pr-102-0  10.150.0.0/20
e2e-gce-agent-pr-105-0  us-east4         e2e-gce-agent-pr-105-0  10.150.0.0/20
e2e-gce-agent-pr-15-0   us-east4         e2e-gce-agent-pr-15-0   10.150.0.0/20
e2e-gce-agent-pr-2-0    us-east4         e2e-gce-agent-pr-2-0    10.150.0.0/20
e2e-gce-agent-pr-21-0   us-east4         e2e-gce-agent-pr-21-0   10.150.0.0/20
e2e-gce-agent-pr-34-0   us-east4         e2e-gce-agent-pr-34-0   10.150.0.0/20
e2e-gce-agent-pr-36-0   us-east4         e2e-gce-agent-pr-36-0   10.150.0.0/20
e2e-gce-agent-pr-47-0   us-east4         e2e-gce-agent-pr-47-0   10.150.0.0/20
e2e-gce-agent-pr-49-0   us-east4         e2e-gce-agent-pr-49-0   10.150.0.0/20
e2e-gce-agent-pr-57-0   us-east4         e2e-gce-agent-pr-57-0   10.150.0.0/20
e2e-gce-agent-pr-98-0   us-east4         e2e-gce-agent-pr-98-0   10.150.0.0/20
default                 europe-west1     default                 10.132.0.0/20
e2e-gce-agent-pr-102-0  europe-west1     e2e-gce-agent-pr-102-0  10.132.0.0/20
e2e-gce-agent-pr-105-0  europe-west1     e2e-gce-agent-pr-105-0  10.132.0.0/20
e2e-gce-agent-pr-15-0   europe-west1     e2e-gce-agent-pr-15-0   10.132.0.0/20
e2e-gce-agent-pr-2-0    europe-west1     e2e-gce-agent-pr-2-0    10.132.0.0/20
e2e-gce-agent-pr-21-0   europe-west1     e2e-gce-agent-pr-21-0   10.132.0.0/20
e2e-gce-agent-pr-34-0   europe-west1     e2e-gce-agent-pr-34-0   10.132.0.0/20
e2e-gce-agent-pr-36-0   europe-west1     e2e-gce-agent-pr-36-0   10.132.0.0/20
e2e-gce-agent-pr-47-0   europe-west1     e2e-gce-agent-pr-47-0   10.132.0.0/20
e2e-gce-agent-pr-49-0   europe-west1     e2e-gce-agent-pr-49-0   10.132.0.0/20
e2e-gce-agent-pr-57-0   europe-west1     e2e-gce-agent-pr-57-0   10.132.0.0/20
e2e-gce-agent-pr-98-0   europe-west1     e2e-gce-agent-pr-98-0   10.132.0.0/20
default                 us-east1         default                 10.142.0.0/20
e2e-gce-agent-pr-102-0  us-east1         e2e-gce-agent-pr-102-0  10.142.0.0/20
e2e-gce-agent-pr-105-0  us-east1         e2e-gce-agent-pr-105-0  10.142.0.0/20
e2e-gce-agent-pr-15-0   us-east1         e2e-gce-agent-pr-15-0   10.142.0.0/20
e2e-gce-agent-pr-2-0    us-east1         e2e-gce-agent-pr-2-0    10.142.0.0/20
e2e-gce-agent-pr-21-0   us-east1         e2e-gce-agent-pr-21-0   10.142.0.0/20
e2e-gce-agent-pr-34-0   us-east1         e2e-gce-agent-pr-34-0   10.142.0.0/20
e2e-gce-agent-pr-36-0   us-east1         e2e-gce-agent-pr-36-0   10.142.0.0/20
e2e-gce-agent-pr-47-0   us-east1         e2e-gce-agent-pr-47-0   10.142.0.0/20
e2e-gce-agent-pr-49-0   us-east1         e2e-gce-agent-pr-49-0   10.142.0.0/20
e2e-gce-agent-pr-57-0   us-east1         e2e-gce-agent-pr-57-0   10.142.0.0/20
e2e-gce-agent-pr-98-0   us-east1         e2e-gce-agent-pr-98-0   10.142.0.0/20
default                 us-central1      default                 10.128.0.0/20
e2e-gce-agent-pr-102-0  us-central1      e2e-gce-agent-pr-102-0  10.128.0.0/20
e2e-gce-agent-pr-105-0  us-central1      e2e-gce-agent-pr-105-0  10.128.0.0/20
e2e-gce-agent-pr-15-0   us-central1      e2e-gce-agent-pr-15-0   10.128.0.0/20
e2e-gce-agent-pr-2-0    us-central1      e2e-gce-agent-pr-2-0    10.128.0.0/20
e2e-gce-agent-pr-21-0   us-central1      e2e-gce-agent-pr-21-0   10.128.0.0/20
e2e-gce-agent-pr-34-0   us-central1      e2e-gce-agent-pr-34-0   10.128.0.0/20
e2e-gce-agent-pr-36-0   us-central1      e2e-gce-agent-pr-36-0   10.128.0.0/20
e2e-gce-agent-pr-47-0   us-central1      e2e-gce-agent-pr-47-0   10.128.0.0/20
e2e-gce-agent-pr-49-0   us-central1      e2e-gce-agent-pr-49-0   10.128.0.0/20
e2e-gce-agent-pr-57-0   us-central1      e2e-gce-agent-pr-57-0   10.128.0.0/20
e2e-gce-agent-pr-98-0   us-central1      e2e-gce-agent-pr-98-0   10.128.0.0/20

@krzyzacy
Copy link
Member

hummm, I take it back, it should not fail every single PR there..

@bowei
Copy link
Member

bowei commented May 31, 2017

Is there a project with extra subnets? From the list above, it looks like 1/project. Also -- I disabled the only CI job that creates its own subnets directly.

@krzyzacy
Copy link
Member

it's 1 subnet per pr run per zone

@fejta
Copy link
Contributor

fejta commented May 31, 2017

Why are we creating subnets in all zones?

@krzyzacy
Copy link
Member

@bowei
when trying to clean up some old subnets, I got:

ERROR: (gcloud.compute.networks.subnets.delete) Some requests did not succeed:
 - Invalid resource usage: 'Cannot delete auto subnetwork from an auto subnet mode network.'.

any idea?

@krzyzacy
Copy link
Member

and the ones not affecting by the subnets also failing -
http://prow.k8s.io/log?pod=pull-kubernetes-e2e-gce-etcd3-33199

seems some node timeout issue?

/assign @pwittrock
as you are the build-cop now, maybe you can help here :-)

@cblecker
Copy link
Member

cblecker commented May 31, 2017

I'm going to guess that the new us-west1-c zone that went live yesterday may have pushed this over the edge?

@krzyzacy
Copy link
Member

requested a bump, I'm more worrying that the runs has not hit by subnets issue also failed, that seems a separate issue.

@j3ffml
Copy link
Contributor

j3ffml commented May 31, 2017

Why are we creating subnets in all zones?

I also would like to know this. Why does this project need so many subnets?

@cblecker
Copy link
Member

So it looks like it's auto-creating subnetworks in all zones because of the --mode=auto flag here:
https://github.com/kubernetes/kubernetes/blob/master/cluster/gce/util.sh#L717

If we changed this to custom, would that break anything? It looks like there are also functions in the script to create subnetworks manually.

Either way, a quota increase would also fix this, it seems.

@krzyzacy
Copy link
Member

quota is bumped, let's see if it fixes things

@cblecker
Copy link
Member

kicking off a test to try it: #46711

@krzyzacy
Copy link
Member

/assign @MrHohn

Some network resource are still leaking, I'm manually running janitor to clean them up. Seems PRs are piling up though.

@krzyzacy
Copy link
Member

@lavalamp lavalamp added priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. kind/failing-test Categorizes issue or PR as related to a consistently or frequently failing test. labels May 31, 2017
@krzyzacy
Copy link
Member

kubernetes/test-infra#2902 should fix it, but we need to wait for couple runs

@krzyzacy
Copy link
Member

krzyzacy commented Jun 1, 2017

https://k8s-gubernator.appspot.com/build/kubernetes-jenkins/pr-logs/pull/46648/pull-kubernetes-e2e-gce-etcd3/33358

things are start passing, now wait for the backlog to drain..

@MrHohn
Copy link
Member

MrHohn commented Jun 1, 2017

Also to clarify what was going on:

  1. Gather test suite metrics for e2e-gce-etcd3 test-infra#2874 mistakenly overwrote GINKGO_TEST_ARGS for k8s-jkns-pr-gce-etcd3, which originally should be --ginkgo.skip=\[Slow\]|\[Serial\]|\[Disruptive\]|\[Flaky\]|\[Feature:.+\].
  2. As the consequence, pull-kubernetes-e2e-gce-etcd3 ran all the e2e tests including Slow, Serial and Disruptive tests without skipping, during which all worst cases could happen -- forwarding rule, target pool, healthcheck and firewall resources may be created and orphaned in every PR job.
  3. Due to the orphaned gce resources --- especially firewall resources, jenkins failed to delete the corresponding subnet because that was in used by the orphaned firewall rules. And gradually this ate up all quotas in project k8s-jkns-pr-gce-etcd3.

@mikedanese mikedanese reopened this Jun 1, 2017
@krzyzacy
Copy link
Member

krzyzacy commented Jun 1, 2017

whoops I was running the clean script from a different branch... now I'd expect old subnets are all gone from the project, and subsequent runs would be fine.

@krzyzacy
Copy link
Member

krzyzacy commented Jun 1, 2017

seems stable now.
/close

@k8s-ci-robot
Copy link
Contributor

@krzyzacy: you can't close an issue unless you authored it or you are assigned to it.

In response to this:

seems stable now.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@krzyzacy
Copy link
Member

krzyzacy commented Jun 1, 2017

/assign

@krzyzacy
Copy link
Member

krzyzacy commented Jun 1, 2017

/close

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/failing-test Categorizes issue or PR as related to a consistently or frequently failing test. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. sig/network Categorizes an issue or PR as relevant to SIG Network.
Projects
None yet
Development

No branches or pull requests