Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

e2e flake: Services [It] should be able to up and down services #22187

Closed
derekwaynecarr opened this issue Feb 29, 2016 · 10 comments
Closed

e2e flake: Services [It] should be able to up and down services #22187

derekwaynecarr opened this issue Feb 29, 2016 · 10 comments
Assignees
Labels
kind/flake Categorizes issue or PR as related to a flaky test. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now.

Comments

@derekwaynecarr
Copy link
Member

• Failure [361.251 seconds]
Services
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/service.go:902
  should be able to up and down services [It]
  /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/service.go:276

  Expected error:
      <*errors.errorString | 0xc2085d8920>: {
          s: "service verification failed for: 10.0.166.191\nexpected [service1-4f12t service1-o67lu service1-yo7vn]\nreceived [service1-4f12t service1-o67lu service1-yo7vn wget: download timed out]",
      }
      service verification failed for: 10.0.166.191
      expected [service1-4f12t service1-o67lu service1-yo7vn]
      received [service1-4f12t service1-o67lu service1-yo7vn wget: download timed out]
  not to have occurred

  /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/service.go:248

#21400 (comment)

@bgrant0607 bgrant0607 added priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. team/cluster kind/flake Categorizes issue or PR as related to a flaky test. labels Feb 29, 2016
@bgrant0607
Copy link
Member

cc @bprashanth

@freehan
Copy link
Contributor

freehan commented Feb 29, 2016

      expected [service1-4f12t service1-o67lu service1-yo7vn]
      received [service1-4f12t service1-o67lu service1-yo7vn wget: download timed out]

Well, fix should be easy

@bprashanth
Copy link
Contributor

we should be running with -q

@bprashanth
Copy link
Contributor

And it's weird if a wget timed out randomly, after the endpoint was supposed to be ready

@bprashanth
Copy link
Contributor

Looked through kubelet/docker/proxy logs, nothing stands out. I have #22203, and we should do a better job tracking network flake, but currently we don't have the setup to do so in the 1.2 timeframe.

@lavalamp
Copy link
Member

lavalamp commented Mar 1, 2016

@thockin to delegate

@bprashanth bprashanth assigned bprashanth and unassigned thockin Mar 1, 2016
@bprashanth
Copy link
Contributor

There's already an LGTM'd pr

@bprashanth
Copy link
Contributor

That's different, it looks like 1/3 pods was stuck in pending:
#25161 (comment), https://pantheon.corp.google.com/storage/browser/kubernetes-jenkins/pr-logs/pull/25161/kubernetes-pull-build-test-e2e-gce/38983/artifacts/

STEP: creating replication controller service3 in namespace e2e-tests-services-jti4b
May  8 04:21:28.718: INFO: Created replication controller with name: service3, namespace: e2e-tests-services-jti4b, replica count: 3
May  8 04:21:31.718: INFO: service3 Pods: 3 out of 3 created, 2 running, 1 pending, 0 waiting, 0 inactive

And I suspect it got stuck in image pulling, because the RC in question is servic3, and 2 pods have the event:

May  8 04:23:32.061: INFO: At {2016-05-08 04:21:29 -0700 PDT} - event for service3-yntb3: {kubelet e2e-gce-builder-3-1-minion-h2cj} Pulled: Container image "gcr.io/google_containers/serve_hostname:v1.4" already present on machine
May  8 04:23:32.061: INFO: At {2016-05-08 04:21:29 -0700 PDT} - event for service3-1hlw6: {kubelet e2e-gce-builder-3-1-minion-l4z8} Pulled: Container image "gcr.io/google_containers/serve_hostname:v1.4" already present on machine

But the pod in question (service3-ku92c) only has "pulling":

May  8 04:23:32.061: INFO: At {2016-05-08 04:21:29 -0700 PDT} - event for service3-ku92c: {kubelet e2e-gce-builder-3-1-minion-dc3e} Pulling: pulling image "gcr.io/google_containers/serve_hostname:v1.4"

So I'm closing as a dupe of #25277

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/flake Categorizes issue or PR as related to a flaky test. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now.
Projects
None yet
Development

No branches or pull requests

6 participants