Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add experimental CRI API for checkpoint/restore #97689

Closed
wants to merge 2 commits into from

Conversation

adrianreber
Copy link
Contributor

@adrianreber adrianreber commented Jan 4, 2021

What type of PR is this?

/kind feature
/kind api-change

What this PR does / why we need it:

This PR contains the first two commits of #97194. I created this PR to make it easier to review the CRI API changes without all the changes for the actual feature implementation in #97194.

I also want to make it easier to continue the discussion in kubernetes/enhancements#1990

This introduces a new, experimental CRI API.

Which issue(s) this PR fixes:
It partly fixes #3949 but it is just a step towards solving #3949 . The next step would be #97194 .

Special notes for your reviewer:
Just two commits split out of #97194 for easier review.

Does this PR introduce a user-facing change?:

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:
kubernetes/enhancements#1990

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. kind/feature Categorizes issue or PR as related to a new feature. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jan 4, 2021
@k8s-ci-robot
Copy link
Contributor

@adrianreber: This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Jan 4, 2021
@k8s-ci-robot
Copy link
Contributor

Hi @adrianreber. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the needs-priority Indicates a PR lacks a `priority/foo` label and requires one. label Jan 4, 2021
@k8s-ci-robot k8s-ci-robot requested review from dims and krmayankk January 4, 2021 17:03
@k8s-ci-robot k8s-ci-robot added area/kubelet sig/node Categorizes an issue or PR as relevant to SIG Node. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Jan 4, 2021
@fejta-bot
Copy link

This PR may require API review.

If so, when the changes are ready, complete the pre-review checklist and request an API review.

Status of requested reviews is tracked in the API Review project.

@dims
Copy link
Member

dims commented Jan 4, 2021

/assign @derekwaynecarr @mrunalp

Derek, Mrunal, how does this fit into the CRI API stuff in progress?

@fejta-bot
Copy link

Unknown CLA label state. Rechecking for CLA labels.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/check-cla

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jan 4, 2021
// CheckpointContainer checkpoints a container
rpc CheckpointContainer(CheckpointContainerRequest) returns (CheckpointContainerResponse) {}
// RestoreContainer restores a container
rpc RestoreContainer(RestoreContainerRequest) returns (RestoreContainerResponse) {}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this apply to all types of containers ? e.g. init and sidecars flagged as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this apply to all types of containers ? e.g. init and sidecars flagged as well?

In my drain --checkpoint implementation (#97194) I am ignoring containers in the 'kube-system' namespace. As long as no raw sockets are used CRIU could checkpoint most processes and containers, but it does not seem useful to checkpoint containers in the 'kube-system' namespace for now.

Checkpointing a Pod in CRI-O also does not checkpoint the pause container, only necessary metadata is written to the checkpoint archive and upon restore a new pause container is created which uses the metadata to provide the same environment for the other restored containers.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @adrianreber , i was asking about https://kubernetes.io/docs/concepts/workloads/pods/init-containers/
Also does it make sense for the CRI to know about kubernetes namespace kube-system at the CRI level ? I believe that logic should be built into kubelet and not into CRI

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i was asking about https://kubernetes.io/docs/concepts/workloads/pods/init-containers/

This was briefly discussed in the corresponding KEP and one idea was to not enable checkpointing before init containers have finished running.

Also does it make sense for the CRI to know about kubernetes namespace kube-system at the CRI level ? I believe that logic should be built into kubelet and not into CRI

Currently it happens in kubectl. Only containers which are not in the 'kube-system' namespace are checkpointed.

// Keep temporary files. Like log files. Helpful for debugging.
bool keep = 1;
// Checkpoint/Restore the container with established TCP connections.
bool tcp_established = 2;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it possible to checkpoint and restore with established tcp connections ? How does that work ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it possible to checkpoint and restore with established tcp connections ?

Yes it is possible to checkpoint and restore established TCP connections. On the lower levels, checkpointing single processes or containers using runc/Podman it makes more sense when talking about the TCP connection from an outside client to the process or container. The restored process or container needs to have access to the same IP or the restore will fail.

In the Kubernetes context, especially for the current drain --checkpoint implementation, it might seem less useful for connections to external clients. It is, however, important to have the possibility to checkpoint and restore established TCP connections for containers and Pods which have open TCP connection between each other.

How does that work ?

Please have a look at how CRIU does that: https://criu.org/TCP_connection

@@ -6964,6 +6964,395 @@ func (m *ReopenContainerLogResponse) XXX_DiscardUnknown() {

var xxx_messageInfo_ReopenContainerLogResponse proto.InternalMessageInfo

// Common options used for checkpointing and restoring.
type CheckpointRestoreOptions struct {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about adding an image directory option for a custom checkpoint storage directory that checkpoint and restore could use to save the checkpoint and restore from the checkpoint respectively?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the archive option. A container checkpoint is much more than just the result of running CRIU. Then a directory would make sense. To make life easier for users I put everything in a tar archive in Podman. The CRIU checkpoint, the changes to the container file system and metadata (like name, IP address, ID and much more). Without those additional information it is difficult to restore a container as it was before without requiring a lot of manual steps from the user during restore. The archive contains everything. Please also see me CRI-O implementation where I either put all container information as described above in a tar archive or even a complete Pod checkpoint containing all information about a Pod to be restored.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah...I missed the archive option here. Thanks for clarifying and explaining the details behind the archive option. I started by looking to check if there is an option to restore multiple containers from the same checkpoint and seems like there is. Restore has to be just pointed to the archive using the import option and given a new name via the name option.
I did go through the Podman implementation and as I mentioned earlier it did answer a lot of my questions. I haven't had the chance to go over the CRI-O implementation yet and will definitely go over it to learn more on a complete Pod checkpoint workflow.

@@ -6964,6 +6964,395 @@ func (m *ReopenContainerLogResponse) XXX_DiscardUnknown() {

var xxx_messageInfo_ReopenContainerLogResponse proto.InternalMessageInfo

// Common options used for checkpointing and restoring.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering what's the plan for adding more options in the future? Is this the exhaustive list or plan is to start with some basic options and to add more based on the use cases? Are these options based on options that are available for criu?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The list of options is based on my work on Podman. All the features I implemented there I added here as they seem to be the most interesting to begin with. It also depends on what runc/crun can do. One of the things runc can currently do which is not represented here is pre-copy and post-copy migration. Which is nice to have in the future but I do not think it is important to think about it now.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Going through your work on adding checkpoint/restore support for Podman helped me put things in perspective and answered my questions. Thanks!

Copy link
Member

@ehashman ehashman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/hold
since this is a POC PR for a KEP which hasn't been approved

/ok-to-test

@adrianreber adrianreber changed the title Extend CRI API for checkpoint/restore Add experimental CRI API for checkpoint/restore Feb 12, 2021
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: adrianreber
To complete the pull request process, please assign derekwaynecarr, liggitt after the PR has been reviewed.
You can assign the PR to them by writing /assign @derekwaynecarr @liggitt in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@adrianreber
Copy link
Contributor Author

/test pull-kubernetes-node-e2e

1 similar comment
@adrianreber
Copy link
Contributor Author

/test pull-kubernetes-node-e2e

@adrianreber
Copy link
Contributor Author

/test pull-kubernetes-bazel-test

@adrianreber
Copy link
Contributor Author

/test pull-kubernetes-node-e2e

First user of this experimental CRI API is checkpoint/restore.

The idea behind this experimental API is that CRI implementations do not
have to implement this but it makes it possible to easily test new
features which require additions to the CRI API.

Signed-off-by: Adrian Reber <areber@redhat.com>
Signed-off-by: Adrian Reber <areber@redhat.com>
@adrianreber
Copy link
Contributor Author

/test pull-kubernetes-node-e2e
/test pull-kubernetes-e2e-kind

@k8s-ci-robot
Copy link
Contributor

k8s-ci-robot commented Feb 25, 2021

@adrianreber: The following tests failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
pull-kubernetes-e2e-azure-file 4b922f8950c497663e3abfa7f943ca2160373a10 link /test pull-kubernetes-e2e-azure-file
pull-kubernetes-e2e-azure-disk 4b922f8950c497663e3abfa7f943ca2160373a10 link /test pull-kubernetes-e2e-azure-disk
pull-kubernetes-e2e-azure-disk-windows 4b922f8950c497663e3abfa7f943ca2160373a10 link /test pull-kubernetes-e2e-azure-disk-windows
pull-kubernetes-e2e-azure-file-windows 4b922f8950c497663e3abfa7f943ca2160373a10 link /test pull-kubernetes-e2e-azure-file-windows
pull-kubernetes-e2e-aks-engine-azure 4b922f8950c497663e3abfa7f943ca2160373a10 link /test pull-kubernetes-e2e-aks-engine-azure
pull-kubernetes-e2e-azure-disk-vmss 4b922f8950c497663e3abfa7f943ca2160373a10 link /test pull-kubernetes-e2e-azure-disk-vmss
pull-kubernetes-e2e-gce-alpha-features 4b922f8950c497663e3abfa7f943ca2160373a10 link /test pull-kubernetes-e2e-gce-alpha-features
pull-kubernetes-e2e-gce-network-proxy-grpc 135d554265d151ea696992259f739721f0164758 link /test pull-kubernetes-e2e-gce-network-proxy-grpc

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@adrianreber
Copy link
Contributor Author

/test pull-kubernetes-node-e2e

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 6, 2021
@k8s-ci-robot
Copy link
Contributor

@adrianreber: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@enj
Copy link
Member

enj commented Apr 19, 2021

/remove-sig auth

Feel free to add sig-auth back if this needs our attention.

@k8s-ci-robot k8s-ci-robot removed the sig/auth Categorizes an issue or PR as relevant to SIG Auth. label Apr 19, 2021
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 18, 2021
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Aug 17, 2021
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue or PR with /reopen
  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

@k8s-ci-robot
Copy link
Contributor

@k8s-triage-robot: Closed this PR.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue or PR with /reopen
  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/apiserver area/cloudprovider area/code-generation area/dependency Issues or PRs related to dependency changes area/kubectl area/kubelet area/provider/gcp Issues or PRs related to gcp provider area/test cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API kind/feature Categorizes issue or PR as related to a new feature. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release-note-none Denotes a PR that doesn't merit a release note. sig/apps Categorizes an issue or PR as relevant to SIG Apps. sig/cli Categorizes an issue or PR as relevant to SIG CLI. sig/cloud-provider Categorizes an issue or PR as relevant to SIG Cloud Provider. sig/network Categorizes an issue or PR as relevant to SIG Network. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

Pod lifecycle checkpointing