Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pull-kubernetes-e2e-gce-etcd3 failing: Quota 'SUBNETWORKS' exceeded #47362

Closed
yujuhong opened this issue Jun 12, 2017 · 19 comments
Closed

pull-kubernetes-e2e-gce-etcd3 failing: Quota 'SUBNETWORKS' exceeded #47362

yujuhong opened this issue Jun 12, 2017 · 19 comments
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. kind/failing-test Categorizes issue or PR as related to a consistently or frequently failing test. sig/network Categorizes an issue or PR as relevant to SIG Network. sig/testing Categorizes an issue or PR as relevant to SIG Testing.
Milestone

Comments

@yujuhong
Copy link
Contributor

The ~5 most recent builds failed because of this issue.
https://k8s-gubernator.appspot.com/builds/kubernetes-jenkins/pr-logs/directory/pull-kubernetes-e2e-gce-etcd3

W0612 12:52:57.395] ERROR: (gcloud.compute.networks.create) Could not fetch resource:
W0612 12:52:57.396]  - Quota 'SUBNETWORKS' exceeded.  Limit: 150.0

/cc @kubernetes/sig-testing-bugs @kubernetes/sig-network-bugs based on the similar issue from ~2 weeks ago #46713

@yujuhong yujuhong added the kind/failing-test Categorizes issue or PR as related to a consistently or frequently failing test. label Jun 12, 2017
@k8s-ci-robot k8s-ci-robot added sig/testing Categorizes an issue or PR as relevant to SIG Testing. kind/bug Categorizes issue or PR as related to a bug. labels Jun 12, 2017
@k8s-github-robot
Copy link

@yujuhong There are no sig labels on this issue. Please add a sig label by:
(1) mentioning a sig: @kubernetes/sig-<team-name>-misc
(2) specifying the label manually: /sig <label>

Note: method (1) will trigger a notification to the team. You can find the team list here and label list here

@k8s-ci-robot k8s-ci-robot added sig/network Categorizes an issue or PR as relevant to SIG Network. kind/bug Categorizes issue or PR as related to a bug. labels Jun 12, 2017
@k8s-github-robot k8s-github-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Jun 12, 2017
@yujuhong yujuhong removed the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Jun 12, 2017
@yujuhong
Copy link
Contributor Author

/cc @dchen1107 @krzyzacy

@krzyzacy
Copy link
Member

that again?
cc @MrHohn

@MrHohn
Copy link
Member

MrHohn commented Jun 12, 2017

@krzyzacy The behavior this time is we leaked an ingress firewall and failed to cleanup the network resource.

W0612 13:20:54.309] ERROR: (gcloud.compute.networks.delete) Some requests did not succeed:
W0612 13:20:54.309]  - The network resource 'projects/k8s-jkns-pr-gce-etcd3/global/networks/e2e-35526' is already being used by 'projects/k8s-jkns-pr-gce-etcd3/global/firewalls/ingress-80-443-e2e-tests-ingress-3qzl2'
W0612 13:20:54.309] 
I0612 13:20:54.410] Failed to delete network 'e2e-35526'. Listing firewall-rules:
I0612 13:20:55.064] NAME                                    NETWORK    SRC_RANGES  RULES           SRC_TAGS  TARGET_TAGS
I0612 13:20:55.065] ingress-80-443-e2e-tests-ingress-3qzl2  e2e-35526  0.0.0.0/0   tcp:80,tcp:443

@nicksardo I thought the ingress tests are only run in slow suite?

@krzyzacy
Copy link
Member

krzyzacy commented Jun 12, 2017

The leaking firewall is created in

I0612 12:14:24.949] STEP: Initializing nginx controller
I0612 12:14:24.949] Jun 12 12:06:08.560: INFO: Creating firewall-rules in project k8s-jkns-pr-gce-etcd3: ingress-80-443-e2e-tests-ingress-g9d4j
I0612 12:14:24.949] Jun 12 12:06:08.560: INFO: Running command: gcloud compute firewall-rules create ingress-80-443-e2e-tests-ingress-g9d4j --project=k8s-jkns-pr-gce-etcd3 --allow tcp:80,tcp:443 --network e2e-35517

and then it cannot acquire the IP,

I0612 12:14:24.955] Jun 12 12:14:12.173: INFO: Waiting for Ingress echomap to acquire IP, error <nil>
I0612 12:14:24.955] Jun 12 12:14:22.197: INFO: Waiting for Ingress echomap to acquire IP, error <nil>
I0612 12:14:24.955] 
I0612 12:14:24.955] ---------------------------------------------------------
I0612 12:14:24.956] Received interrupt.  Running AfterSuite...
I0612 12:14:24.956] ^C again to terminate immediately

and result in

/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/ingress.go:194
Jun 12 12:05:02.109: Ingress failed to acquire an IP address within 15m0s
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/framework/ingress_utils.go:900

seems that ingress test is pretty much screwed?
https://k8s-gubernator.appspot.com/build/kubernetes-jenkins/pr-logs/pull/47274/pull-kubernetes-e2e-gce-etcd3/35517
once it's timed out it's not deleting created resources

@dchen1107 dchen1107 added this to the v1.7 milestone Jun 12, 2017
@MrHohn
Copy link
Member

MrHohn commented Jun 12, 2017

The ingress test is supposed to cleanup firewall resource in AfterEach(), but before that it received sigterm signals so it stopped:

I0612 13:15:04.080] Jun 12 13:14:59.681: INFO: Waiting for Ingress echomap to acquire IP, error <nil>
I0612 13:15:04.080] 
I0612 13:15:04.080] ---------------------------------------------------------
I0612 13:15:04.080] Received interrupt.  Running AfterSuite...
I0612 13:15:04.080] ^C again to terminate immediately
I0612 13:15:04.081] Jun 12 13:15:03.970: INFO: Running AfterSuite actions on all node
I0612 13:15:04.081] 
I0612 13:15:33.752] 
I0612 13:15:33.753] Jun 12 13:06:01.025: INFO: Running AfterSuite actions on all node
I0612 13:15:33.753] 
I0612 13:15:33.753] ---------------------------------------------------------

@nicksardo
Copy link
Contributor

@aledbf Looks like this is the nginx ingress test. We're going to move this to the slow suite.

@dchen1107
Copy link
Member

SGTM.

@nicksardo
Copy link
Contributor

Will cleanup the project resources.

@dchen1107
Copy link
Member

@krzyzacy Can we removed those leaked network resources for recover the build? Thanks~!

@krzyzacy
Copy link
Member

just wiped some firewalls. @nicksardo I'll leave the rest to you :-)

@nicksardo
Copy link
Contributor

All ingress firewalls are gone. Will delete the old networks.

@nicksardo
Copy link
Contributor

Old networks have been cleared. Everything should be good-to-go.

@nicksardo
Copy link
Contributor

I'll cleanup resources again after my PR gets merged.

@nicksardo
Copy link
Contributor

/assign

@ericchiang
Copy link
Contributor

@nicksardo this test was re-assigned to the slow suits but is still failing there. https://k8s-testgrid.appspot.com/release-master-blocking#gci-gke-slow

@nicksardo
Copy link
Contributor

@aledbf Mind looking at this?

@aledbf
Copy link
Member

aledbf commented Jun 13, 2017

@nicksardo sure

@aledbf
Copy link
Member

aledbf commented Jun 13, 2017

@nicksardo reading one of the logs https://k8s-gubernator.appspot.com/build/kubernetes-jenkins/logs/ci-kubernetes-e2e-gci-gke-slow/7064/nodelog?junit=junit_08.xml&wrap=on
I see this

E0613 00:10:33.950066    1552 remote_runtime.go:91] RunPodSandbox from runtime service failed: rpc error: code = 2 desc = NetworkPlugin kubenet failed to set up pod "nginx-ingress-controller-6xd68_e2e-tests-ingress-l50hv" network: cannot open hostport 80 for pod nginx-ingress-controller-6xd68_e2e-tests-ingress-l50hv: listen tcp :80: bind: address already in use

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. kind/failing-test Categorizes issue or PR as related to a consistently or frequently failing test. sig/network Categorizes an issue or PR as relevant to SIG Network. sig/testing Categorizes an issue or PR as relevant to SIG Testing.
Projects
None yet
Development

No branches or pull requests

9 participants