-
Notifications
You must be signed in to change notification settings - Fork 40k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Upgrading a node from kubelet 1.1.3 to 1.2.0 results in containers getting destroyed and recreated #23104
Comments
We need a documented policy about docker label and container-name changes. For 1.2, at minimum, we need to document this under "Action required" in the release notes, and maybe recommend cc @Random-Liu |
@bgrant0607 Thanks, will do it today. |
Sorry for the breakage introduced by this. I did raise the concern when we first worked on #15089, unfortunately docker doesn't support mutable container label. We decided to move on with it because we thought upgrading kubelet binary without kubectl drain is illegal since 1.1 release, and our e2e test never test against this scenario. We didn't realize the customer build their own live-upgrade tools. |
I don't think the docker label related thing has anything to do with the container killing, because in fact we only rely on it to get Here is log for one of the pods
Looks like the pod contains 4 containers.
1 of them is killed because container hash has changed:
Why is the container hash changed? |
Clarification: We still get most of the information (UID, hash, name, etc) from the docker name itself, specifically to handle backwards compatibility. EDIT: strike the following statement, which is untrue. |
Then it could be a new default function which changes PodSpec / ContainerSpec from 1.1 to 1.2 indeed, and kubelet kills the containers. |
@ghodss What is the previous version you were using before upgrading? |
it is kubelet 1.1.3 from the issue's title |
@dchen1107 Silly me...Saw that...I tried to find it in the issue content... :) |
I uploaded a PR #23141 to fix one particular problem: kubelet starts pod cleanup routine prematurely and kills desired pods. The remaining problem is that the UID/container hash of the static pods seem to have changed after upgrade, as @dchen1107 said in #23104 (comment) |
I too am inclined to believe it's not the labels issue. I recall seeing similar statements about the labels in the logs before and after I actually did the upgrade, but unfortunately I don't still have those logs. Does kubelet do defaulting? I thought that was in the apiserver. |
We use the same library to decode and validate a pod from non-apiserer sources. kubernetes/pkg/kubelet/config/common.go Line 91 in 7e15800
Yeah, the label-related messages are loud but harmless. We plan to switch to relying on labels (and stopped polluting the docker names) in the future. We will also be able to filter pods by labels, etc. |
More background: For pods from non-apiserver sources, kubelet hashes the entire pod object and generate a UID for it. Any change in the pod spec leads to a new UID, and is considered a different pod. This is necessary because kubelet doesn't know how to validate updates to pods, so treating it as a new pod is simpler and safer. |
Forgot to paste the difference between v1.1.3 and a v1.2 pods:
The non-nil securityContext seems to be the problem. The rest (status and annotations) is not relevant. |
|
This applies only for the static pods (manifest files), but yes, we'll add it to the release note.
We've discussed some options, e.g., hash the versioned object before converting the the internal objects. I am not sure there are no corner cases, and we'll need to think about it some more. |
|
Both options indicate that we won't restart the pod when defaults change. I am not sure that's a desired behavior. |
Right. That's something for @dchen1107 to decide. |
But if you do want to restart on default change, then this is WAI... |
Defaults are represented in the serialized, versioned form, also. |
I don't think this is right. In my example above, the pods come from apiserver, not static manifest files. |
That was mainly because of a race condition in kubelet, and should be fixed by #23141 |
My bad. @dchen1107 pointed out that since the SecurityContext is per container, the container hash will also change in the apiserver. The containers in the regular pods will then be recreated. |
I sent #23227 to update release notes to include this as known issue for 1.2 release. We are going to continue discussing how to handle this. So far all proposed solutions above couldn't resolve the current problem caused by the defaulting without draining. But it is terribly bad we didn't catch this issue due to lack of understanding our users. The assumption we had is no one is doing the live-upgrading because kube-push is broken since 1.1 release is totally wrong. |
IMO, we should first establish the best practices for upgrading a cluster, so that we can test against it :-) |
@lavalamp @bgrant0607 Do you know if upgrading kubelet from 1.2.0 to 1.3.0 will bounce the containers? |
@davidopp did our manual upgrade testing include looking for this? |
Yes, upgrading kubelet from 1.2 to 1.3 restarts the containers on the node. I thought this was WAI? We can look into it if that's not supposed to happen. |
It would be much preferable if it didn't restart the pods. It makes the upgrade/downgrade maintenance much easier to schedule, perform, and roll back, especially in bare metal environments. |
If you are using cluster/gce/upgrade.sh script this is expected bheavior. We use MIG rolling-update feature which basically recreates new VMs using the new startup script. kube-push was supposed to do in-place upgrades but it doesn't work. |
Yeah, from the first entry in the issue it looks like Sam was doing a manual version of kube-push. It would be great if we could test that scenario. |
I agree that we should test all supported upgrade processes, but where can I them in the documentation? If in-place upgrade is supported officially, we should prioritize this issue. |
I think there's a spectrum--we don't have to say we "officially support" in-place upgrade (and definitely nobody is recommending to make kube-up.sh work in time for the 1.3 release), but it seems like it would be nice if we can ensure that upgrading kubelet doesn't cause the containers to restart. OTOH, my understanding is that we qualify only one specific Docker version with each Kubelet version, so I guess we're expecting people to upgrade Kubelet and Docker together. Since Docker restart kills all your containers (or does the latest version make this not happen anymore?), then maybe this isn't something worth worrying about until Docker upgrade can be done without killing containers. |
We haven't seen any specific issues reported with Kube 1.2 and Docker 1.10, so we've already upgraded our prod clusters to Docker 1.10 (which was a pain because of the killing of containers). Also I would imagine in the future that hopefully if a new version of Docker breaks kube, we'd do a point release for it so that kube is forwards compatible with newer Docker versions. And yes, Docker is moving towards not having to restart containers on upgrades. |
That's all I'm asking for. |
My point is that we already know that there are issues (e.g, api changes) that may cause kubelet to restart the containers. AFAIK, nothing has changed after that. Unless this is prioritized, it's unlikely that these issues will get resolved automatically. That's why "nice-to-have" may not mean much...
We only test one docker version exhaustively, but generally each release is compatible with 2~3 docker versions. 1.3 release should be compatible with docker v1.9 ~v1.11. Red Hat also has their own recommended dcoker version for their environment. But yes, if user has to upgrade docker, all containers will get restarted. |
FWIW, I did a manual upgrade from v1.2 to v1.3 and did not notice any container get restarted.
Most of the default cluster addon pods got recreated to upgrade to newer versions. |
Seeing the same issue when upgrade kubelet from v1.0.1 to v1.1.4, all containers will be recreated after restart kubelet, any workaroud that we can use to prevent container recreation? |
@nkwangleiGIT it's probably the best for you to drain all the pods from the node before upgrading kubelet to avoid any unnecessary container restarts. Upgrading kubelet in place isn't recommended officially. If data in your containers needs to be preserved across restarts, they would be better off stored in a persistent volume. |
@yujuhong ok, I see, so we need to drain all pods from node first before in-place upgrade kubelet. BTW, is this the future design, or we plan to make it better? I expect we can support in-place kubelet upgrade without recreate containers, thanks. |
The issue for fixing in-place upgrade is #17397 (the title may be a little misleading, as we may do it differently than kube-push.sh). IMO "not restarting containers" is something we should consider as a possible requirement, but until we know exactly how we will address the issue, it's probably too early to commit to it. |
Issues go stale after 30d of inactivity. Prevent issues from auto-closing with an If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or |
Forgot to post a reason before closing the issue. Container hash change needs to be addressed if in-place upgrade is going to guarantee no container restarts. I think we can fold this into the upgrade issues (#6099) |
We upgrade our nodes by upgrading an RPM that stops the kubelet, updates the binary and starts it again. The behavior I'm seeing (100% consistently and reproducibly) is that when I upgrade the RPM, the new kubelet starts up, and after maybe 3-5 seconds starts killing and restarting the containers on the machine. Usually it does all of them, but sometimes it does a subset.
You can see the log of the new 1.2 kubelet starting up, up to the point it starts killing containers at https://gist.github.com/ghodss/c579579e53355aed6508.
The pods are not evicted from apiserver, and their age does not change.
Downgrading to 1.1 has the same effect of killing and recreating the containers. Note that I had the exact same thing happen when we upgraded from 1.0.3 to 1.1.3. Let me know if you need any other details.
@bgrant0607 @dchen1107 @yujuhong
(marking as P0 per @bgrant0607)
The text was updated successfully, but these errors were encountered: