-
Notifications
You must be signed in to change notification settings - Fork 40k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Have an option to keep Pod around for debugging #14602
Comments
@bparees - this struck me as something you would find of interest and want I also think it's related to this as one potential means to support On Friday, September 25, 2015, David Oppenheimer notifications@github.com
|
thanks @derekwaynecarr, definitely the ability to get the pod logs seems extremely useful. in relation to #14561, the ability to do more than what is described here and actually commit the containers of the failed/moved pod and push it to a registry so i can pull the image (basically a snapshot at that point) and do some investigation of the ephemeral filesystem/attempt to start the process, etc would also be extremely useful imho. |
Sort of related: #3949 |
I have a slightly different use-case, still sort of related to what @davidopp mentioned. When implementing #17940 and still to come #17244 I was struck we don't have any option to gracefully terminate a pod. Aforementioned issues/PRs are dealing with a job which should be terminated upon certain conditions (timeout, remote termination). Such pod should then be in a failed state denoting that it was terminated prematurely. @davidopp does this also fits your use-case, or should I rather create a separate issue regarding the topic? |
Related: #2789 |
Note that if you don't care about the resources consumed, it's easy to just keep a pod around by changing its labels to orphan it from its controller. |
I guess not. If I'm setting a deadline, I'm rather interested in limiting the resources consumption. At least that's how I see this. |
Issues go stale after 30d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or |
Stale issues rot after 30d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or |
Rotten issues close after 30d of inactivity. Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
We received an interesting suggestion today which has multiple layers to it but basically the use case was that they would like us to keep Pod state around for debugging in the case where a Pod is "moved" to another machine. This would presumably be an option in the Pod, rather than default behavior, since in the normal case people probably only care about the logs and not the full state.
[Please read "move" here in the proper cattle-centric way, meaning "kill old Pod and create a fungible replica on a different machine"]
Today IIUC the only scenario where we "move" a Pod to a new machine is if the machine fails. So the feature request is that when the machine comes back up, we want it to still have logs, containers, host directories, Pod-scoped directories, etc. still intact. I guess this is somewhat related to the Borg critical data concept, although for a very different purpose.
In the future I imagine we will want some other scenarios to trigger the Pod to "move" -- for example there is a restart loop (kubelet keeps restarting the container locally due to repeated OOM or something else that boils down to "the Pod is just not going to work here, but might work in another machine"). In this case you'd want to also keep the same state from the old Pod around despite creating a replacement replica elsewhere.
The last use case that was brought up was debugging deadlock. There would be an option to start a new replica on another node while keeping the deadlocked Pod/container running so that you can attach a debugger to the process that is deadlocked. Of course you'd also like to keep the same state we've been talking about for the other cases (logs, containers, host directories, Pod-scoped directories).
@mikedanese does this sound like a fairly accurate description of the request? Anything I left out?
The text was updated successfully, but these errors were encountered: