Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[release/1.7] Make PodSandboxStatus friendlier to shim crashes #10461

Conversation

dims
Copy link
Member

@dims dims commented Jul 15, 2024

Currently if you're using the shim-mode sandbox server support, if your shim that's hosting the Sandbox API dies for any reason that wasn't intentional (segfault, oom etc.) PodSandboxStatus is kind of wedged. We can use the fact that if we didn't go through the usual k8s flow of Stop->Remove and we still have an entry in our sandbox store, us not having a shim mapping anymore means this was likely unintentional.

@dims
Copy link
Member Author

dims commented Jul 15, 2024

@mikebrow @dcantah this is a bug that needs to be in 1.7.x too right? i see a label in #8367 for sure.

@dcantah
Copy link
Member

dcantah commented Jul 15, 2024

this is a bug that needs to be in 1.7.x too right?

It looks like it yes

@samuelkarp samuelkarp added the area/cri Container Runtime Interface (CRI) label Jul 15, 2024
Currently if you're using the shim-mode sandbox server support, if your
shim that's hosting the Sandbox API dies for any reason that wasn't
intentional (segfault, oom etc.) PodSandboxStatus is kind of wedged.
We can use the fact that if we didn't go through the usual k8s flow
of Stop->Remove and we still have an entry in our sandbox store,
us not having a shim mapping anymore means this was likely unintentional.

Signed-off-by: Danny Canter <danny@dcantah.dev>
@dims dims force-pushed the automated-cherry-pick-of-#8367-upstream-release-1.7 branch from e329ee3 to df86bdd Compare July 16, 2024 14:17
@dims
Copy link
Member Author

dims commented Jul 16, 2024

/skip

@dims
Copy link
Member Author

dims commented Jul 16, 2024

/test pull-containerd-k8s-e2e-ec2

@k8s-ci-robot
Copy link

k8s-ci-robot commented Jul 16, 2024

@dims: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-containerd-k8s-e2e-ec2 df86bdd link false /test pull-containerd-k8s-e2e-ec2

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@dims
Copy link
Member Author

dims commented Jul 16, 2024

hmm looks like skip plugin is not enabled, let me do that! kubernetes/test-infra#32994

@dims
Copy link
Member Author

dims commented Jul 16, 2024

/skip

@dims
Copy link
Member Author

dims commented Jul 16, 2024

/test pull-containerd-node-e2e-1-7

Copy link
Member

@mikebrow mikebrow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@estesp estesp merged commit 34ea461 into containerd:release/1.7 Jul 16, 2024
58 checks passed
@dmcgowan dmcgowan changed the title [release/1.7] CRI Sbserver: Make PodSandboxStatus friendlier to shim crashes [release/1.7] Make PodSandboxStatus friendlier to shim crashes Jul 17, 2024
Mengkzhaoyun pushed a commit to open-beagle/containerd that referenced this pull request Aug 27, 2024
containerd 1.7.20

Welcome to the v1.7.20 release of containerd!

The twentieth patch release for containerd 1.7 contains various fixes
and updates.

* Support for dropping inheritable capabilities ([#10469](containerd/containerd#10469))

* Make PodSandboxStatus friendlier to shim crashes ([#10461](containerd/containerd#10461))
* Handle empty DNSConfig differently than unspecified ([#10462](containerd/containerd#10462))
* Fix for `[cri] ttrpc: closed` during ListPodSandboxStats ([#10423](containerd/containerd#10423))

Please try out the release binaries and report any issues at
https://github.com/containerd/containerd/issues.

* Derek McGowan
* Akihiro Suda
* Phil Estes
* Akhil Mohan
* Bryant Biggs
* Danny Canter
* Davanum Srinivas
* Mike Brown
* Samuel Karp
* Tim Hockin
<details><summary>16 commits</summary>
<p>

* Prepare release notes for v1.7.20 ([#10481](containerd/containerd#10481))
  * [`7f2d4cd97`](containerd/containerd@7f2d4cd) Prepare release notes for v1.7.20
* deps: Update otelgrpc ([#10413](containerd/containerd#10413))
  * [`3a02c523d`](containerd/containerd@3a02c52) deps: Update otelgrpc
* Make PodSandboxStatus friendlier to shim crashes ([#10461](containerd/containerd#10461))
  * [`df86bdd5d`](containerd/containerd@df86bdd) CRI Sbserver: Make PodSandboxStatus friendlier to shim crashes
* Handle empty DNSConfig differently than unspecified ([#10462](containerd/containerd#10462))
  * [`209ee4f10`](containerd/containerd@209ee4f) CRI: An empty DNSConfig != unspecified
* Support for dropping inheritable capabilities ([#10469](containerd/containerd#10469))
  * [`ce65228af`](containerd/containerd@ce65228) Support for dropping inheritable capabilities
* Fix for `[cri] ttrpc: closed` during ListPodSandboxStats ([#10423](containerd/containerd#10423))
  * [`610498df7`](containerd/containerd@610498d) Fix for `[cri] ttrpc: closed` during ListPodSandboxStats
* update to go1.21.12 / go1.22.5 ([#10426](containerd/containerd#10426))
  * [`e61c7932e`](containerd/containerd@e61c793) update to go1.21.12 / go1.22.5
* errdefs: denote deprecation as a godoc comment ([#10424](containerd/containerd#10424))
  * [`c7d5e430a`](containerd/containerd@c7d5e43) errdefs: denote deprecation as a godoc comment
</p>
</details>

* **github.com/go-logr/logr**                                                      v1.2.4 -> v1.3.0
* **github.com/google/go-cmp**                                                     v0.5.9 -> v0.6.0
* **github.com/google/uuid**                                                       v1.3.1 -> v1.4.0
* **go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc**  v0.45.0 -> v0.46.1
* **go.opentelemetry.io/otel**                                                     v1.19.0 -> v1.21.0
* **go.opentelemetry.io/otel/metric**                                              v1.19.0 -> v1.21.0
* **go.opentelemetry.io/otel/sdk**                                                 v1.19.0 -> v1.21.0
* **go.opentelemetry.io/otel/trace**                                               v1.19.0 -> v1.21.0
* **google.golang.org/genproto**                                                   e6e6cdab5c13 -> 989df2bf70f3
* **google.golang.org/genproto/googleapis/api**                                    007df8e322eb -> 83a465c0220f
* **google.golang.org/genproto/googleapis/rpc**                                    d307bd883b97 -> 995d672761c0

Previous release can be found at [v1.7.19](https://github.com/containerd/containerd/releases/tag/v1.7.19)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/cri Container Runtime Interface (CRI) impact/changelog kind/bug size/S
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants