Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

e2e flake: deployment should support rollover [minUnavailable/maxSurge violated] #22719

Closed
bgrant0607 opened this issue Mar 8, 2016 · 16 comments
Assignees
Labels
area/app-lifecycle kind/flake Categorizes issue or PR as related to a flaky test. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.

Comments

@bgrant0607
Copy link
Member

kubernetes-jenkins/logs/kubernetes-e2e-gce/12967

• Failure [314.132 seconds]
Deployment
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/deployment.go:72
  deployment should support rollover [It]
  /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/deployment.go:59

  Expected error:
      <*errors.errorString | 0xc2084d9e90>: {
          s: "failed to wait for pods running: [gave up waiting for pod 'test-rollover-controller-0pi42' to be 'running' after 5m0s]",
      }
      failed to wait for pods running: [gave up waiting for pod 'test-rollover-controller-0pi42' to be 'running' after 5m0s]
  not to have occurred

  /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/deployment.go:461

cc @janetkuo

@bgrant0607 bgrant0607 added priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. team/ux kind/flake Categorizes issue or PR as related to a flaky test. labels Mar 8, 2016
@bgrant0607
Copy link
Member Author

kubernetes-jenkins/logs/kubernetes-e2e-gce/13020

• Failure [188.857 seconds]
Deployment
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/deployment.go:72
  deployment should support rollover [It]
  /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/deployment.go:59

  Expected error:
      <*errors.errorString | 0xc20835e7e0>: {
          s: "error waiting for deployment test-rollover-deployment status to match expectation: total pods created: 6, more than the max allowed: 5",
      }
      error waiting for deployment test-rollover-deployment status to match expectation: total pods created: 6, more than the max allowed: 5
  not to have occurred

  /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/deployment.go:511

@bgrant0607 bgrant0607 added this to the v1.2 milestone Mar 9, 2016
@bgrant0607 bgrant0607 added priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. and removed priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. labels Mar 9, 2016
@bgrant0607
Copy link
Member Author

See also #21810

@janetkuo
Copy link
Member

Looking at the controller-manager log, this is similar to #21810: the old RS is scaled up (when it should be scaled down) unexpectedly.

@janetkuo janetkuo changed the title e2e flake: deployment should support rollover e2e flake: deployment should support rollover [minUnavailable/maxSurge violated] Mar 11, 2016
@bgrant0607
Copy link
Member Author

Removing from v1.2 unless/until this recurs

@bgrant0607 bgrant0607 modified the milestones: next-candidate, v1.2 Mar 11, 2016
@bgrant0607 bgrant0607 added priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. and removed priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. labels Mar 11, 2016
@janetkuo
Copy link
Member

Watching if #22828 fixes this failure mode.

@nikhiljindal
Copy link
Contributor

Failed again: #22872 (comment)

Deployment
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/deployment.go:79
  deployment should support rollover [It]
  /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/deployment.go:66

  Expected error:
      <*errors.errorString | 0xc208551dd0>: {
          s: "error waiting for deployment test-rollover-deployment status to match expectation: total pods created: 7, more than the max allowed: 5",
      }
      error waiting for deployment test-rollover-deployment status to match expectation: total pods created: 7, more than the max allowed: 5
  not to have occurred

  /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/deployment.go:518

@janetkuo
Copy link
Member

The scaling events look right, but one of the old replicas's status.replicas and spec.replicas doesn't look right.

test-rollover-controller has been scaled down 4 -> 3 -> 1 -> 0, but its spec.replicas and status.replicas are both 3 (should be 0 and 1), and there's only 1 pod (available) it manages in the system when the test failed.

@janetkuo
Copy link
Member

test-rollover-controller replicas seem to be updated correctly.

I0311 20:56:14.457855       6 replica_set_utils.go:45] Updating replica count for ReplicaSet: test-rollover-controller, 3->3 (need 1), sequence No: 4->5
I0311 20:56:14.502351       6 replica_set_utils.go:45] Updating replica count for ReplicaSet: test-rollover-controller, 3->1 (need 1), sequence No: 5->5
I0311 20:56:15.684891       6 replica_set_utils.go:45] Updating replica count for ReplicaSet: test-rollover-controller, 1->0 (need 0), sequence No: 6->6
I0311 20:56:15.693625       6 replica_set_utils.go:45] Updating replica count for ReplicaSet: test-rollover-controller, 0->0 (need 0), sequence No: 6->6

@janetkuo
Copy link
Member

In the e2e test log, test-rollover-controller's Generation and ObservedGeneration are 4.

@janetkuo
Copy link
Member

1. I0311 20:55:46.742636       6 replica_set_utils.go:45] Updating replica count for ReplicaSet: test-rollover-controller, 4->3 (need 3), sequence No: 4->4
2. I0311 20:55:48.416437       6 replica_set_utils.go:45] Updating replica count for ReplicaSet: test-rollover-deployment-2012901151, 2->0 (need 0), sequence No: 3->3
3. I0311 20:56:14.457855       6 replica_set_utils.go:45] Updating replica count for ReplicaSet: test-rollover-controller, 3->3 (need 1), sequence No: 4->5
4. I0311 20:56:14.500772       6 replica_set_utils.go:45] Updating replica count for ReplicaSet: test-rollover-deployment-2446224220, 2->2 (need 4), sequence No: 2->3

Looks like the RSes are correctly scaled up/down, and the replica count is updated right, but the e2e test waitForDeploymentStatus listed RSes without noticing the test-rollover-controller replica count update in step 3.

@janetkuo
Copy link
Member

I think it's because in waitForDeploymentStatus we didn't list old RSes and new RSes atomically.

@bgrant0607
Copy link
Member Author

Good catch!

FWIW, there is no such thing as "atomic", except for individual keys in etcd.

The functions being called were mutating, not just polling.

@Random-Liu
Copy link
Member

Another two occurrences:
https://pantheon.corp.google.com/storage/browser/kubernetes-jenkins/pr-logs/pull/23506/kubernetes-pull-build-test-e2e-gce/34204/artifacts/
https://pantheon.corp.google.com/storage/browser/kubernetes-jenkins/pr-logs/pull/23763/kubernetes-pull-build-test-e2e-gce/34198/

/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/deployment.go:69
Expected error:
    <*errors.errorString | 0xc2083cef60>: {
        s: "error waiting for deployment test-rollover-deployment status to match expectation: total pods available: 2, less than the min required: 3",
    }
    error waiting for deployment test-rollover-deployment status to match expectation: total pods available: 2, less than the min required: 3
not to have occurred

@janetkuo
Copy link
Member

cc @pwittrock

@janetkuo
Copy link
Member

Closing in favor of #26509

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/app-lifecycle kind/flake Categorizes issue or PR as related to a flaky test. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.
Projects
None yet
Development

No branches or pull requests

4 participants