Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Deployment with Recreate strategy not to wait on Pods in terminal phase #60301

Merged
merged 2 commits into from
Feb 26, 2018

Conversation

tnozicka
Copy link
Contributor

@tnozicka tnozicka commented Feb 23, 2018

What this PR does / why we need it:
Fixes Deployment with Recreate strategy not to wait on Pods in terminal phase. It can happen after eviction or failing to match selector and RS leaves such pod around right now. (Hopefully RC gets fixed separately.)

Which issue(s) this PR fixes *:
Fixes #60162

Special notes for your reviewer:

Release note:

Fixes a case when Deployment with recreate strategy could get stuck on old failed Pod.

/cc @janetkuo

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Feb 23, 2018
@tnozicka
Copy link
Contributor Author

tnozicka commented Feb 23, 2018

Per @jsafrane: "volumes are unmounted for failed or succeeded pods" - so we should be safe in that regard

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Feb 23, 2018
@tnozicka tnozicka changed the title [WIP] - Fix Deployment with Recreate strategy not to wait on Pods in terminal phase Fix Deployment with Recreate strategy not to wait on Pods in terminal phase Feb 23, 2018
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 23, 2018
@tnozicka
Copy link
Contributor Author

/kind bug
/sig apps
/priority P1

/cc @kow3ns @kubernetes/sig-apps-bugs

@k8s-ci-robot k8s-ci-robot added sig/apps Categorizes an issue or PR as relevant to SIG Apps. kind/bug Categorizes issue or PR as related to a bug. priority/P1 labels Feb 23, 2018
@tnozicka
Copy link
Contributor Author

/test pull-kubernetes-unit

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Feb 23, 2018
@tnozicka
Copy link
Contributor Author

tests are green :)

},
{
name: "old RSs with zero status replicas but pods in terminal state are present",
oldRSs: []*extensions.ReplicaSet{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can use the same format as above test cases: []*extensions.ReplicaSet{newRSWithStatus("rs-blabla", 0, 0, nil)},

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer to state the tested value more verbosely in contrast to argument to a function that let's you look somewhere else for what the value means, but I have fixed it for consistency and your request.

return true
for _, pod := range podList.Items {
switch pod.Status.Phase {
case v1.PodFailed, v1.PodSucceeded:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be impossible for RS to have succeeded pods, but there's no harm to add this check.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be impossible, but a prefer to handle even such case explicitly.

case v1.PodFailed, v1.PodSucceeded:
continue
case v1.PodUnknown:
glog.Warning("Encountered Pod %s/%s in unknown state", pod.Namespace, pod.Name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO I don't think we need to check unknown pods and print a warning here. Just return true in default case should be enough.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. That doesn't conform to the logging standards for controller_manager and could get spammy.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are right - fixed, thx

@janetkuo
Copy link
Member

janetkuo commented Feb 23, 2018

Because ReplicaSets (and Deployments) only allow restartPolicy:Always, the only case their pods would end up in a terminated state is when 1) kubelet evicts a pod (causing the pod in a Failed state), or 2) kubelet rejects a pod when scheduling (also Failed pod). I confirmed with node team and it should be safe to assume that containers have terminated and resources have been released in both cases.

@janetkuo
Copy link
Member

We should cherrypick this to 1.9.

@kow3ns kow3ns modified the milestones: v1.9, v1.10 Feb 24, 2018
@kow3ns
Copy link
Member

kow3ns commented Feb 24, 2018

@tnozicka @janetkuo I'm assuming we want to target 1.10.0 with this and not cherry pick it back into 1.10.1 and 1.9.x ?

@k8s-cherrypick-bot
Copy link

Removing label cherrypick-candidate because no release milestone was set. This is an invalid state and thus this PR is not being considered for cherry-pick to any release branch. Please add an appropriate release milestone and then re-add the label.

@tnozicka
Copy link
Contributor Author

Comments addressed.
I'll cherrypick it back to 1.9 once it merges.

@janetkuo
Copy link
Member

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 26, 2018
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: janetkuo, tnozicka

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 26, 2018
@janetkuo
Copy link
Member

I'll cherrypick it back to 1.9 once it merges.

@tnozicka Thanks! Would you cherrypick it to 1.8 also?

@k8s-github-robot
Copy link

/test all [submit-queue is verifying that this PR is safe to merge]

@k8s-github-robot
Copy link

Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions here.

@k8s-github-robot k8s-github-robot merged commit e491689 into kubernetes:master Feb 26, 2018
@tnozicka tnozicka deleted the fix-recreate branch February 27, 2018 10:05
k8s-github-robot pushed a commit that referenced this pull request Feb 28, 2018
Automatic merge from submit-queue.

[1.8] - Fix Deployment with Recreate strategy not to wait on Pods in terminal phase

Backport of #60301

```release-note
Fixes a case when Deployment with recreate strategy could get stuck on old failed Pod.
```

/assign @janetkuo @kow3ns
k8s-github-robot pushed a commit that referenced this pull request Mar 5, 2018
Automatic merge from submit-queue.

[1.9] - Fix Deployment with Recreate strategy not to wait on Pods in terminal phase

Backport of #60301

```release-note
Fixes a case when Deployment with recreate strategy could get stuck on old failed Pod.
```

/assign @janetkuo @kow3ns
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/apps Categorizes an issue or PR as relevant to SIG Apps. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Deployment controller fails to create new RS when old RS has an evicted pod
6 participants