-
Notifications
You must be signed in to change notification settings - Fork 39.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
All containers restart after kubelet upgrade to v1.31 #129385
Comments
This issue is currently awaiting triage. If a SIG or subproject determines this is a relevant issue, they will accept it by applying the The Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
No running containers are supported while upgrading kubelets across minor versions. See https://kubernetes.io/releases/version-skew-policy/#kubelet-1
|
/remove-kind bug |
modification: #124220. key point: The hash value in the existing container's annotation will not match the hash value calculated by the kubelet. |
@liggitt @Chaunceyctx Currently, only the image is used to calculate the hash. I don't understand why we have to use hash to determine whether to restart the container instead of directly comparing the image. |
Is it necessary to keep container hash annotation? |
Using a hash ensures that any significant change in the container's configuration, whether it's the image or other critical attributes, triggers a restart to maintain consistency and reliability. |
While comparing the image directly might seem sufficient, a hash-based approach provides a comprehensive and efficient mechanism to detect changes across all configuration parameters, avoiding potential bugs and ensuring predictable behavior. |
In my view, this is why we are using the hash value. |
/sig node |
/cc |
I totally understand upgrading in-place is not recommended and it is really hard for the community to maintain such a way. But for large clusters, it is not easy to strictly follow the drain-node-upgrading steps. At least we should introduce an opt-in flag/gate to allow users using the old hash value, which could help avoid unnecessary pod restarts during upgrading. We should make a compromise and let the cluster administrators make the decision. Moreover, such pod restarts due to changed spec hash values are unnecessary, since kube-apiserver has already got the validation checks and only allowed a few updateable fields, such as images. Currently on kubelet side, hash value annotations are mostly used for annotations, we'd better not rely on this to determine whether to restart/kill the container during upgrading. |
I think we should to clarify what situations will cause Pod restarts. So far, there is only one situation, which is updating the image. If there are other situations in the future, we can consider using hash. But so far, I think using hash is unnecessary. |
Thank you for pointing this out! |
But we should also keep in mind the potential future scenarios where additional restart triggers may be introduced. |
In the kubelet resource management area there are quite some behaviors which were sometimes questionable from the beginning, or the usual temporary solutions which stuck forever, or in general quite some behaviors established incidentally rather than deliberately which despite not being covered by backward compatibility promise are unchanged since many years and users effectively depend upon them. But these behaviors are not guaranteed stable, yet they are treated as such. So in sig-node we are contemplating - and sometimes already proposing - to add feature gates to protect actual bugfixes or changes that in theory should not require them. This long preamble to say that this case resonates with that experience and I see why a feature gate can be appealing. Thing is: is too late for that (and way worse to revert, I agree with that stance). Retroactively add a feature gate after 8 months doesn't look great and I'm not sure how much is helpful. Perhaps (warning: uninformed musing, didn't review the original change) the change could have been protected by a FG in the first place and we can be more vigilant in the future? OTOH it seems there's a slow (slow?) emerging trend about these cases and listening to users is paramount, so I guess we should probably try to think to a more systemic approach? Last but not least, feature gate proliferation is not ideal either. It leads to a very fragmented experience, make our testing matrix explode, and sometimes only postpone the issues. I'm struggling to find a good solution here. |
What happened?
All containers restart after kubelet upgrade to v1.31
What did you expect to happen?
Containers shouldn't restart after kubelet upgrade
How can we reproduce it (as minimally and precisely as possible)?
upgrade kubelet from v1.30 to v1.31
Anything else we need to know?
#124220 (comment)
Kubernetes version
Cloud provider
OS version
Install tools
Container runtime (CRI) and version (if applicable)
Related plugins (CNI, CSI, ...) and versions (if applicable)
The text was updated successfully, but these errors were encountered: