Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pods stuck terminating/not cleaned up automatically due to invalid spec #49492

Closed
andor44 opened this issue Jul 24, 2017 · 22 comments
Closed

Pods stuck terminating/not cleaned up automatically due to invalid spec #49492

andor44 opened this issue Jul 24, 2017 · 22 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling.

Comments

@andor44
Copy link

andor44 commented Jul 24, 2017

Is this a BUG REPORT or FEATURE REQUEST?:

/kind bug

What happened:
Created a deployment on 1.6.4 that included pod affinity annotations. Then upgraded to 1.7.1 which changed the schema of affinity annotations. Updated the deployment with the new schema, which spun up a new RS and scaled down the old one. Pods belonging to the old RS went Terminating but never successfully terminated. kubectl delete pod --force --now foo would give a successful message but did nothing. Attempting to delete the RS would result in

error: Scaling the resource failed with: ReplicaSet.extensions "foobar" is invalid: spec.template.annotations.affinity.podAntiAffinity.preferredDuringSchedulingIgnoredDuringExecution[0].podAffinityTerm.topologyKey: Required value: can not be empty; Current resource version 174652233

After editing the pods manually to match the new schema they went away immediately.

What you expected to happen:
The old, invalid replica set to successfully scale down and clean up its pods after the deployment is updated without needing manual intervention.

How to reproduce it (as minimally and precisely as possible):
Have a deployment with a pod affinity that looks like this:

  "podAntiAffinity": {
    "preferredDuringSchedulingIgnoredDuringExecution": [
      {
        "weight": 80,
        "labelSelector": {
          "matchExpressions": [
            {
              "key": "app",
              "operator": "In",
              "values": ["foobar"]
            }
          ]
        },
        "topologyKey": "failure-domain.beta.kubernetes.io/zone"
      }
    ]
  }
}

Needs (I assume) <1.7.0, we created it on 1.6.4.

Update to 1.7.0+, observe that pods now have events like this:

  FirstSeen	LastSeen	Count	From					SubObjectPath	Type		Reason			Message
  ---------	--------	-----	----					-------------	--------	------			-------
  42m		5m		176	kubelet, xxxxxxxxxxxxxxxxxxxxxxxxxxxx			Warning		FailedValidation	Error validating pod xxx from api, ignoring: metadata.annotations.affinity.podAntiAffinity.preferredDuringSchedulingIgnoredDuringExecution[0].podAffinityTerm.topologyKey: Required value: can not be empty

Update the deployment that created the pods to match the new spec. The pods of the old replicaset will be stuck Terminating until the pods are manually edited to match the new schema, at which point they will clean up instantly.

Anything else we need to know?:

Environment:

  • Kubernetes version (use kubectl version): Pre-upgrade 1.6.4, post-upgrade 1.7.1
  • Cloud provider or hardware configuration**: Bare metal
  • OS (e.g. from /etc/os-release):
NAME="Container Linux by CoreOS"
ID=coreos
VERSION=1409.7.0
VERSION_ID=1409.7.0
BUILD_ID=2017-07-19-0005
PRETTY_NAME="Container Linux by CoreOS 1409.7.0 (Ladybug)"
ANSI_COLOR="38;5;75"
HOME_URL="https://coreos.com/"
BUG_REPORT_URL="https://issues.coreos.com"
COREOS_BOARD="amd64-usr"
  • Kernel (e.g. uname -a): uname -a Linux xxx 4.11.11-coreos #1 SMP Tue Jul 18 23:06:59 UTC 2017 x86_64 Intel(R) Xeon(R) CPU E5-2630 v2 @ 2.60GHz GenuineIntel GNU/Linux
  • Install tools: Matchbox/Ansible
  • Others:
@k8s-github-robot k8s-github-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Jul 24, 2017
@k8s-ci-robot k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Jul 24, 2017
@derekwaynecarr
Copy link
Member

cc @kubernetes/sig-scheduling-misc @aveshagarwal

@k8s-ci-robot k8s-ci-robot added the sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. label Jul 24, 2017
@k8s-github-robot k8s-github-robot removed the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Jul 24, 2017
@aveshagarwal
Copy link
Member

aveshagarwal commented Jul 24, 2017

Created a deployment on 1.6.4 that included pod affinity annotations.

Is there any reason you are still using annotations with 1.6 and not fields? In general, I'd say it would be better if you could have migrated from annotations to fields before upgrading to 1.7. That said there was an issue (#44360) that we fixed in 1.7 which we missed in 1.6. I am still looking if your issue is related to this or not.

@wanghaoran1988
Copy link
Contributor

@andor44 Could you please provide your pull deployment json/yaml file?

@aveshagarwal
Copy link
Member

I think I can reproduce your issue by upgrading 1.6 to 1.7 and I see the same error. And it is indeed happening due to the fix to the issue #44360 .

@andor44
Copy link
Author

andor44 commented Jul 25, 2017

Is there any reason you are still using annotations with 1.6 and not fields?

None, other than annotations coming first so that's how these deployments were written originally.

In general, I'd say it would be better if you could have migrated from annotations to fields before upgrading to 1.7.

The 1.7 release notes did not indicate that the annotations would stop working or break in some way so we assumed it was safe to upgrade. The note about the feature gate did not concern us as we had never enabled that feature gate. It seems to me that this is either an undocumented breaking change or a bug, hence this issue.

Here's one of the affected deployments. Be aware that this wasn't the only one.

@wanghaoran1988
Copy link
Contributor

@aveshagarwal I am not clear about the problem, why the topologyKey is empty? I see the key have values:
"topologyKey": "failure-domain.beta.kubernetes.io/zone"

Could you give me some pointers?

@aveshagarwal
Copy link
Member

@wanghaoran1988 my guess is that they might have updated topology key to something (so that its not empty now) during upgrade to 1.7.
The issue is that if the topology key is empty in 1.6 for pod anti affinity PreferredDuringSchedulingIgnoredDuringExecution, and then its upgraded to 1.7, it would fail because now 1.7, incorrectly, it requires the topology key to be not empty for pod anti affinity PreferredDuringSchedulingIgnoredDuringExecution when AffinityInAnnotation is disabled.

@wanghaoran1988
Copy link
Contributor

@aveshagarwal Thanks, that make sense !

@andor44
Copy link
Author

andor44 commented Jul 28, 2017

@wanghaoran1988 @aveshagarwal no, it was not updated, it was set incorrectly. topologyKey needs to be under another object podAffinityTerm, but it was one level up. Since before 1.7 it was not mandatory it was silently ignored. We updated to 1.7 which made it mandatory, hence the failure.

@andor44
Copy link
Author

andor44 commented Jul 28, 2017

To clarify, preferredDuringSchedulingIgnoredDuringExecution should have been:

{
    "weight": 80,
    "labelSelector": {
        "matchExpressions": [
        {
            "key": "app",
            "operator": "In",
            "values": ["foobar"]
        }
        ]
    },
    "podAffinityTerm": {
        "topologyKey": "failure-domain.beta.kubernetes.io/zone"
    }
}

but it was:

{
    "weight": 80,
    "labelSelector": {
        "matchExpressions": [
        {
            "key": "app",
            "operator": "In",
            "values": ["foobar"]
        }
        ]
    },
    "topologyKey": "failure-domain.beta.kubernetes.io/zone"
}

Omitting topologyKey is no longer permitted in 1.7, therefore the second one became invalid.

@aveshagarwal
Copy link
Member

@andor44 After further discussion with sig scheduling group, it is intended to not allow empty topology key with soft pod anti affinity (preferredDuringSchedulingIgnoredDuringExecution pod anti affinity). So it is working as expected. So i think this bug could be closed.

@andor44
Copy link
Author

andor44 commented Aug 1, 2017

@aveshagarwal I realize at this point that the name of the issue is probably misleading. My issue is not that the topology key is required. I understand that it is required, I am not reporting that as the bug. The bug is that I had an object already Kubernetes that became invalid after the upgrade. Once it became invalid it stayed in Kubernetes and it could not be deleted. Delete requests returned a success return code, but the object didn't go away. It could only be deleted once the object was manually edited to conform to the spec. This behavior is very counter-intuitive and I do not see the reason for disallowing object deletion of invalid objects.

Should I edit the title, or open a new bug and reference this one?

@aveshagarwal
Copy link
Member

@andor44 ok then you could keep it open.

@alex88
Copy link

alex88 commented Aug 26, 2017

In my case, having an old replicaset with a bad configuration doesn't let me create new replicasets so deploy new versions as old replicaset still has a desired number of pods that can't run due configuration errors

@alex88
Copy link

alex88 commented Aug 26, 2017

Even if deployment has no affinity configuration, doing a get deployment -o yaml shows that annotation kubectl.kubernetes.io/last-applied-configuration has that affinity config and status shows this failure

status:
  availableReplicas: 1
  conditions:
  - lastTransitionTime: 2017-08-25T00:37:32Z
    lastUpdateTime: 2017-08-25T00:37:32Z
    message: 'unable to create pods: Pod "php-2221189365-kzxvk" is invalid: metadata.annotations.affinity.podAntiAffinity.preferredDuringSchedulingIgnoredDuringExecution[0].podAffinityTerm.topologyKey:
      Required value: can not be empty'
    reason: FailedCreate
    status: "True"
    type: ReplicaFailure

is there a quick workaround at least to be able to deploy and delete old replica set?

UPDATE: from the dashboard, I can delete the replicaset without any problem....

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

Prevent issues from auto-closing with an /lifecycle frozen comment.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or @fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 10, 2018
@ravisantoshgudimetla
Copy link
Contributor

@andor44 - Can you please change the labels as this issue is not related to sig-scheduling anymore? I think adding appropriate labels might help in getting the attention of sig's

@andor44
Copy link
Author

andor44 commented Jan 15, 2018

I don't know what sig is appropriate, maybe API machinery?

@3h4x
Copy link

3h4x commented Feb 5, 2018

Just run into this issue after upgrading k8s from 1.6.6 to 1.7.12.
Deployments working in 1.6.6 stopped working after upgrade. We have pods stuck in Terminating state that can't be deleted.

@martynd
Copy link

martynd commented Feb 24, 2018

Hit this also, 1.6.13 to 1.7.12.

In my case however we were not using the older annotations still. It seems this affects both the old and new affinity formats. Something should definitely be done, a previously valid deployment/replicaset/podspec shouldnt become invalid during an upgrade due to a spec change.

I found an easy enough work around for the issues though.

For those still affected:
TLDR; Fix the deployment if you haven't already, then edit the latest (pre upgrade) replicaset (it should be the 2nd newest, with the newest being the one just created). Once the replicaset is correct, the rollout is happy and cleans things up

The reason editing the deployment doesn't fix it (and leaves old pods in a strange state) is the old replicaset.

When the deployment is edited in an attempt to add the topologykey, it makes a new replicaset and scales down the old.
As the old one is missing the key, the internal rollout process fails to scale down the number of replicas as the replicaset is invalid.
From here it all just gets stuck with little explanation as the deployment is now referring to the new correct replicaset

@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten
/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Mar 26, 2018
@fejta-bot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling.
Projects
None yet
Development

No branches or pull requests