-
Notifications
You must be signed in to change notification settings - Fork 40.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add experimental CRI API for checkpoint/restore #97689
Conversation
@adrianreber: This issue is currently awaiting triage. If a SIG or subproject determines this is a relevant issue, they will accept it by applying the The Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Hi @adrianreber. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
This PR may require API review. If so, when the changes are ready, complete the pre-review checklist and request an API review. Status of requested reviews is tracked in the API Review project. |
/assign @derekwaynecarr @mrunalp Derek, Mrunal, how does this fit into the CRI API stuff in progress? |
Unknown CLA label state. Rechecking for CLA labels. Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
// CheckpointContainer checkpoints a container | ||
rpc CheckpointContainer(CheckpointContainerRequest) returns (CheckpointContainerResponse) {} | ||
// RestoreContainer restores a container | ||
rpc RestoreContainer(RestoreContainerRequest) returns (RestoreContainerResponse) {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this apply to all types of containers ? e.g. init and sidecars flagged as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this apply to all types of containers ? e.g. init and sidecars flagged as well?
In my drain --checkpoint
implementation (#97194) I am ignoring containers in the 'kube-system' namespace. As long as no raw sockets are used CRIU could checkpoint most processes and containers, but it does not seem useful to checkpoint containers in the 'kube-system' namespace for now.
Checkpointing a Pod in CRI-O also does not checkpoint the pause container, only necessary metadata is written to the checkpoint archive and upon restore a new pause container is created which uses the metadata to provide the same environment for the other restored containers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @adrianreber , i was asking about https://kubernetes.io/docs/concepts/workloads/pods/init-containers/
Also does it make sense for the CRI to know about kubernetes namespace kube-system at the CRI level ? I believe that logic should be built into kubelet and not into CRI
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i was asking about https://kubernetes.io/docs/concepts/workloads/pods/init-containers/
This was briefly discussed in the corresponding KEP and one idea was to not enable checkpointing before init containers have finished running.
Also does it make sense for the CRI to know about kubernetes namespace kube-system at the CRI level ? I believe that logic should be built into kubelet and not into CRI
Currently it happens in kubectl. Only containers which are not in the 'kube-system' namespace are checkpointed.
// Keep temporary files. Like log files. Helpful for debugging. | ||
bool keep = 1; | ||
// Checkpoint/Restore the container with established TCP connections. | ||
bool tcp_established = 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it possible to checkpoint and restore with established tcp connections ? How does that work ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it possible to checkpoint and restore with established tcp connections ?
Yes it is possible to checkpoint and restore established TCP connections. On the lower levels, checkpointing single processes or containers using runc/Podman it makes more sense when talking about the TCP connection from an outside client to the process or container. The restored process or container needs to have access to the same IP or the restore will fail.
In the Kubernetes context, especially for the current drain --checkpoint
implementation, it might seem less useful for connections to external clients. It is, however, important to have the possibility to checkpoint and restore established TCP connections for containers and Pods which have open TCP connection between each other.
How does that work ?
Please have a look at how CRIU does that: https://criu.org/TCP_connection
@@ -6964,6 +6964,395 @@ func (m *ReopenContainerLogResponse) XXX_DiscardUnknown() { | |||
|
|||
var xxx_messageInfo_ReopenContainerLogResponse proto.InternalMessageInfo | |||
|
|||
// Common options used for checkpointing and restoring. | |||
type CheckpointRestoreOptions struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about adding an image directory option for a custom checkpoint storage directory that checkpoint and restore could use to save the checkpoint and restore from the checkpoint respectively?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See the archive option. A container checkpoint is much more than just the result of running CRIU. Then a directory would make sense. To make life easier for users I put everything in a tar archive in Podman. The CRIU checkpoint, the changes to the container file system and metadata (like name, IP address, ID and much more). Without those additional information it is difficult to restore a container as it was before without requiring a lot of manual steps from the user during restore. The archive contains everything. Please also see me CRI-O implementation where I either put all container information as described above in a tar archive or even a complete Pod checkpoint containing all information about a Pod to be restored.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah...I missed the archive option here. Thanks for clarifying and explaining the details behind the archive option. I started by looking to check if there is an option to restore multiple containers from the same checkpoint and seems like there is. Restore has to be just pointed to the archive using the import option and given a new name via the name option.
I did go through the Podman implementation and as I mentioned earlier it did answer a lot of my questions. I haven't had the chance to go over the CRI-O implementation yet and will definitely go over it to learn more on a complete Pod checkpoint workflow.
@@ -6964,6 +6964,395 @@ func (m *ReopenContainerLogResponse) XXX_DiscardUnknown() { | |||
|
|||
var xxx_messageInfo_ReopenContainerLogResponse proto.InternalMessageInfo | |||
|
|||
// Common options used for checkpointing and restoring. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wondering what's the plan for adding more options in the future? Is this the exhaustive list or plan is to start with some basic options and to add more based on the use cases? Are these options based on options that are available for criu?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The list of options is based on my work on Podman. All the features I implemented there I added here as they seem to be the most interesting to begin with. It also depends on what runc/crun can do. One of the things runc can currently do which is not represented here is pre-copy and post-copy migration. Which is nice to have in the future but I do not think it is important to think about it now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Going through your work on adding checkpoint/restore support for Podman helped me put things in perspective and answered my questions. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/hold
since this is a POC PR for a KEP which hasn't been approved
/ok-to-test
f72e9f0
to
fe53b12
Compare
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: adrianreber The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/test pull-kubernetes-node-e2e |
1 similar comment
/test pull-kubernetes-node-e2e |
/test pull-kubernetes-bazel-test |
/test pull-kubernetes-node-e2e |
First user of this experimental CRI API is checkpoint/restore. The idea behind this experimental API is that CRI implementations do not have to implement this but it makes it possible to easily test new features which require additions to the CRI API. Signed-off-by: Adrian Reber <areber@redhat.com>
Signed-off-by: Adrian Reber <areber@redhat.com>
fe53b12
to
a740072
Compare
/test pull-kubernetes-node-e2e |
@adrianreber: The following tests failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
/test pull-kubernetes-node-e2e |
@adrianreber: PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/remove-sig auth Feel free to add sig-auth back if this needs our attention. |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-contributor-experience at kubernetes/community. |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /close |
@k8s-triage-robot: Closed this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
What type of PR is this?
/kind feature
/kind api-change
What this PR does / why we need it:
This PR contains the first two commits of #97194. I created this PR to make it easier to review the CRI API changes without all the changes for the actual feature implementation in #97194.
I also want to make it easier to continue the discussion in kubernetes/enhancements#1990
This introduces a new, experimental CRI API.
Which issue(s) this PR fixes:
It partly fixes #3949 but it is just a step towards solving #3949 . The next step would be #97194 .
Special notes for your reviewer:
Just two commits split out of #97194 for easier review.
Does this PR introduce a user-facing change?:
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:
kubernetes/enhancements#1990