Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

E2E Node tests for image pull backoff and crashloopbackoff behavior #128559

Conversation

lauralorenz
Copy link
Contributor

@lauralorenz lauralorenz commented Nov 5, 2024

What type of PR is this?

/kind cleanup

What this PR does / why we need it:

Add e2e tests buit on top of the CRI proxy framework to test the backoff behavior of image pulls and container restarts. Includes a case where container restarts are configured using the alpha feature from KEP-4306.

Which issue(s) this PR fixes:

Related to kubernetes/enhancements#4603

Special notes for your reviewer:

Test freeze exception: https://groups.google.com/g/kubernetes-sig-node/c/zYclDRIyD0w
How to run:

make test-e2e-node REMOTE=false PARALLELISM=1 FOCUS="Container 
Restart|Pull Image" SKIP="\[Flaky\]|\[Slow\]"  TEST_ARGS='--kubelet-flags="--fail-swap-on=false" --cri-proxy-enabled=true'

Does this PR introduce a user-facing change?

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


/hold

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-kind Indicates a PR lacks a `kind/foo` label and requires one. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. area/kubelet area/test sig/node Categorizes an issue or PR as relevant to SIG Node. sig/testing Categorizes an issue or PR as relevant to SIG Testing. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Nov 5, 2024
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Nov 5, 2024
@lauralorenz
Copy link
Contributor Author

/test pull-kubernetes-node-e2e-containerd

@lauralorenz
Copy link
Contributor Author

Hiya @SergeyKanzhelev (cc @tallclair) I'm going to keep working on this tomorrow, but heads up if you have any opinions on how this is shaping up (as I intend to use something similar to e2e test container restarts too), the latest commit has some informative TODOs of where I'm currently at. Most relevantly I am looking for (/ possibly need to make?) a util that can snag kubelet logs for a defined time period because that's where the data I need to parse really is, as I don't seem to be able to do it with the events API; if you know of something that already does that in the node e2e suite please point me in the direction as I haven't found anything so far. Thanks!

@lauralorenz
Copy link
Contributor Author

Ideas:

  • set up a goroutine to watch the events and create my own event log on the side
  • can get the kubelet logs directly from the kubelet endpoint to parse them after the test runs
  • still thinking maybe there is an example of kubelet log parsing in the other e2e tests?
  • e2e slow downs can require timeouts more like 1-2 minutes minimum to observe behavior (e.g. pod running (? i think) timeout is 2 minutes)

@SergeyKanzhelev
Copy link
Member

  • e2e slow downs can require timeouts more like 1-2 minutes minimum to observe behavior (e.g. pod running (? i think) timeout is 2 minutes)

Is the question how to check that the image pull backoff did not inherit the container crash loop backoff?

Easiest you can do - check how fast you receive the next image pull in the cri proxy. If container crash loop backoff configured to 5 seconds, make sure you are not getting image pull backoffs less then so many times in one minute.

@aojea
Copy link
Member

aojea commented Nov 6, 2024

this should merge with #128374 , adding e2e separated of the feature is not a good practice

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 8, 2024
Signed-off-by: Laura Lorenz <lauralorenz@google.com>
Signed-off-by: Laura Lorenz <lauralorenz@google.com>
Signed-off-by: Laura Lorenz <lauralorenz@google.com>
Copy link
Member

@thockin thockin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

I looked at the tests and they make sense to me, but I'm not able to dig into the underlying framework as much as someone who knows it very well.

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: lauralorenz, thockin

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 13, 2024
@thockin
Copy link
Member

thockin commented Nov 13, 2024

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 13, 2024
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 7cf3281708e43a7febbc701a3195cec8c9df0420

Focused too much on the container restart one in commit that fixed that

Signed-off-by: Laura Lorenz <lauralorenz@google.com>
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 13, 2024
@k8s-ci-robot k8s-ci-robot requested a review from thockin November 13, 2024 01:41
@lauralorenz
Copy link
Contributor Author

lauralorenz commented Nov 13, 2024

@tallclair I had changed the sleeps to shorter times in 285d433, but focused on running the container restart tests (which are green on prow here) and didn't update the expectation of the shorter sleep on the image pull test.

Now in
9ab0d81 I'm expecting >=3 for a timeout of 30s (first 0s, second ~0s, third ~10s, fourth won't happen until on or just after 30s), whereas before I expected noninclusive >3 as I expected when I had 1 minute.

@tallclair
Copy link
Member

/test pull-kubernetes-node-e2e-cri-proxy-serial
/lgtm
/hold

Please make sure the pull-kubernetes-node-e2e-cri-proxy-serial run passes before removing the hold

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 13, 2024
@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 13, 2024
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 0db6e203d06421fab66b369522ccf14644c3d914

@tallclair
Copy link
Member

/triage accepted
/priority important-soon

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Nov 13, 2024
@lauralorenz
Copy link
Contributor Author

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 13, 2024
@fsmunoz
Copy link
Contributor

fsmunoz commented Nov 13, 2024

/milestone v1.32

@k8s-ci-robot k8s-ci-robot added this to the v1.32 milestone Nov 13, 2024
@lauralorenz
Copy link
Contributor Author

/retest-required

@k8s-ci-robot k8s-ci-robot merged commit 5ee686b into kubernetes:master Nov 13, 2024
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/e2e-test-framework Issues or PRs related to refactoring the kubernetes e2e test framework area/kubelet area/test cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note-none Denotes a PR that doesn't merit a release note. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

8 participants