-
Notifications
You must be signed in to change notification settings - Fork 40k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pods stuck terminating/not cleaned up automatically due to invalid spec #49492
Comments
cc @kubernetes/sig-scheduling-misc @aveshagarwal |
Is there any reason you are still using annotations with 1.6 and not fields? In general, I'd say it would be better if you could have migrated from annotations to fields before upgrading to 1.7. That said there was an issue (#44360) that we fixed in 1.7 which we missed in 1.6. I am still looking if your issue is related to this or not. |
@andor44 Could you please provide your pull deployment json/yaml file? |
I think I can reproduce your issue by upgrading 1.6 to 1.7 and I see the same error. And it is indeed happening due to the fix to the issue #44360 . |
None, other than annotations coming first so that's how these deployments were written originally.
The 1.7 release notes did not indicate that the annotations would stop working or break in some way so we assumed it was safe to upgrade. The note about the feature gate did not concern us as we had never enabled that feature gate. It seems to me that this is either an undocumented breaking change or a bug, hence this issue. Here's one of the affected deployments. Be aware that this wasn't the only one. |
…edDuringExecution pod anti affinity is used without topology key. Fixes issue: kubernetes#49492
@aveshagarwal I am not clear about the problem, why the topologyKey is empty? I see the key have values: Could you give me some pointers? |
@wanghaoran1988 my guess is that they might have updated topology key to something (so that its not empty now) during upgrade to 1.7. |
@aveshagarwal Thanks, that make sense ! |
@wanghaoran1988 @aveshagarwal no, it was not updated, it was set incorrectly. |
To clarify,
but it was:
Omitting |
@andor44 After further discussion with sig scheduling group, it is intended to not allow empty topology key with soft pod anti affinity (preferredDuringSchedulingIgnoredDuringExecution pod anti affinity). So it is working as expected. So i think this bug could be closed. |
@aveshagarwal I realize at this point that the name of the issue is probably misleading. My issue is not that the topology key is required. I understand that it is required, I am not reporting that as the bug. The bug is that I had an object already Kubernetes that became invalid after the upgrade. Once it became invalid it stayed in Kubernetes and it could not be deleted. Delete requests returned a success return code, but the object didn't go away. It could only be deleted once the object was manually edited to conform to the spec. This behavior is very counter-intuitive and I do not see the reason for disallowing object deletion of invalid objects. Should I edit the title, or open a new bug and reference this one? |
@andor44 ok then you could keep it open. |
In my case, having an old replicaset with a bad configuration doesn't let me create new replicasets so deploy new versions as old replicaset still has a desired number of pods that can't run due configuration errors |
Even if deployment has no affinity configuration, doing a
is there a quick workaround at least to be able to deploy and delete old replica set? UPDATE: from the dashboard, I can delete the replicaset without any problem.... |
Issues go stale after 90d of inactivity. Prevent issues from auto-closing with an If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or |
@andor44 - Can you please change the labels as this issue is not related to sig-scheduling anymore? I think adding appropriate labels might help in getting the attention of sig's |
I don't know what sig is appropriate, maybe API machinery? |
Just run into this issue after upgrading k8s from 1.6.6 to 1.7.12. |
Hit this also, 1.6.13 to 1.7.12. In my case however we were not using the older annotations still. It seems this affects both the old and new affinity formats. Something should definitely be done, a previously valid deployment/replicaset/podspec shouldnt become invalid during an upgrade due to a spec change. I found an easy enough work around for the issues though. For those still affected: The reason editing the deployment doesn't fix it (and leaves old pods in a strange state) is the old replicaset. When the deployment is edited in an attempt to add the topologykey, it makes a new replicaset and scales down the old. |
Stale issues rot after 30d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Rotten issues close after 30d of inactivity. Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Is this a BUG REPORT or FEATURE REQUEST?:
/kind bug
What happened:
Created a deployment on 1.6.4 that included pod affinity annotations. Then upgraded to 1.7.1 which changed the schema of affinity annotations. Updated the deployment with the new schema, which spun up a new RS and scaled down the old one. Pods belonging to the old RS went
Terminating
but never successfully terminated.kubectl delete pod --force --now foo
would give a successful message but did nothing. Attempting to delete the RS would result inerror: Scaling the resource failed with: ReplicaSet.extensions "foobar" is invalid: spec.template.annotations.affinity.podAntiAffinity.preferredDuringSchedulingIgnoredDuringExecution[0].podAffinityTerm.topologyKey: Required value: can not be empty; Current resource version 174652233
After editing the pods manually to match the new schema they went away immediately.
What you expected to happen:
The old, invalid replica set to successfully scale down and clean up its pods after the deployment is updated without needing manual intervention.
How to reproduce it (as minimally and precisely as possible):
Have a deployment with a pod affinity that looks like this:
Needs (I assume) <1.7.0, we created it on 1.6.4.
Update to 1.7.0+, observe that pods now have events like this:
Update the deployment that created the pods to match the new spec. The pods of the old replicaset will be stuck
Terminating
until the pods are manually edited to match the new schema, at which point they will clean up instantly.Anything else we need to know?:
Environment:
kubectl version
): Pre-upgrade 1.6.4, post-upgrade 1.7.1uname -a
):uname -a Linux xxx 4.11.11-coreos #1 SMP Tue Jul 18 23:06:59 UTC 2017 x86_64 Intel(R) Xeon(R) CPU E5-2630 v2 @ 2.60GHz GenuineIntel GNU/Linux
The text was updated successfully, but these errors were encountered: