Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

e2e daemonset: stronger health check of DaemonSet status #127988

Merged
merged 1 commit into from
Oct 11, 2024

Conversation

pohly
Copy link
Contributor

@pohly pohly commented Oct 10, 2024

What type of PR is this?

/kind cleanup

What this PR does / why we need it:

The error was only generated if both checks (generated pods and ready pods) failed. This looks like a logic error, failing if either of those isn't matching expectations seems better.

Does this PR introduce a user-facing change?

NONE

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Oct 10, 2024
@k8s-ci-robot
Copy link
Contributor

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Oct 10, 2024
@k8s-ci-robot k8s-ci-robot added area/e2e-test-framework Issues or PRs related to refactoring the kubernetes e2e test framework approved Indicates a PR has been approved by an approver from all required OWNERS files. area/test sig/testing Categorizes an issue or PR as relevant to SIG Testing. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Oct 10, 2024
Copy link
Member

@neolit123 neolit123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 10, 2024
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 48f89956d4ea9d711aaf6f6133b59f93b71e6cac

Copy link
Member

@neolit123 neolit123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for some reason

ERROR: test/e2e/framework/daemonset/fixtures.go:150:5: badCond: `desired == scheduled && desired == ready` condition is suspicious (gocritic)
ERROR: 	if desired == scheduled && desired == ready {
ERROR: 	   ^

https://prow.k8s.io/view/gs/kubernetes-jenkins/pr-logs/pull/127988/pull-kubernetes-linter-hints/1844441752560209920

@k8s-triage-robot
Copy link

The Kubernetes project has merge-blocking tests that are currently too flaky to consistently pass.

This bot retests PRs for certain kubernetes repos according to the following rules:

  • The PR does have any do-not-merge/* labels
  • The PR does not have the needs-ok-to-test label
  • The PR is mergeable (does not have a needs-rebase label)
  • The PR is approved (has cncf-cla: yes, lgtm, approved labels)
  • The PR is failing tests required for merge

You can:

/retest

1 similar comment
@k8s-triage-robot
Copy link

The Kubernetes project has merge-blocking tests that are currently too flaky to consistently pass.

This bot retests PRs for certain kubernetes repos according to the following rules:

  • The PR does have any do-not-merge/* labels
  • The PR does not have the needs-ok-to-test label
  • The PR is mergeable (does not have a needs-rebase label)
  • The PR is approved (has cncf-cla: yes, lgtm, approved labels)
  • The PR is failing tests required for merge

You can:

/retest

@pacoxu
Copy link
Member

pacoxu commented Oct 11, 2024

/hold
for the CI failure, feel free to unhold once it is fixed.

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 11, 2024
The error was only generated if both checks (generated pods and ready pods)
failed. This looks like a logic error, failing if either of those isn't
matching expectations seems better.
@pohly pohly force-pushed the e2e-daemonset-health-check branch from 0387e7e to 3ec8437 Compare October 11, 2024 08:36
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 11, 2024
@pohly
Copy link
Contributor Author

pohly commented Oct 11, 2024

That linter check is a false positive. Luckily, we can trick it without having to go back to the previous double-negotiation approach:

diff --git a/test/e2e/framework/daemonset/fixtures.go b/test/e2e/framework/daemonset/fixtures.go
index 6ae6fd10341..ed725010224 100644
--- a/test/e2e/framework/daemonset/fixtures.go
+++ b/test/e2e/framework/daemonset/fixtures.go
@@ -147,7 +147,7 @@ func CheckDaemonStatus(ctx context.Context, f *framework.Framework, dsName strin
                return err
        }
        desired, scheduled, ready := ds.Status.DesiredNumberScheduled, ds.Status.CurrentNumberScheduled, ds.Status.NumberReady
-       if desired == scheduled && desired == ready {
+       if desired == scheduled && scheduled == ready {
                return nil
        }
        return fmt.Errorf("error in daemon status. DesiredScheduled: %d, CurrentScheduled: %d, Ready: %d", desired, scheduled, ready)

Copy link
Member

@neolit123 neolit123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 11, 2024
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: bbad676c43163b2e7395bf01e410f265f198cb8e

@aojea
Copy link
Member

aojea commented Oct 11, 2024

/hold cancel

That linter check is a false positive. Luckily, we can trick it without having to go back to the previous double-negotiation approach:

diff --git a/test/e2e/framework/daemonset/fixtures.go b/test/e2e/framework/daemonset/fixtures.go
index 6ae6fd10341..ed725010224 100644
--- a/test/e2e/framework/daemonset/fixtures.go
+++ b/test/e2e/framework/daemonset/fixtures.go
@@ -147,7 +147,7 @@ func CheckDaemonStatus(ctx context.Context, f *framework.Framework, dsName strin
                return err
        }
        desired, scheduled, ready := ds.Status.DesiredNumberScheduled, ds.Status.CurrentNumberScheduled, ds.Status.NumberReady
-       if desired == scheduled && desired == ready {
+       if desired == scheduled && scheduled == ready {
                return nil
        }
        return fmt.Errorf("error in daemon status. DesiredScheduled: %d, CurrentScheduled: %d, Ready: %d", desired, scheduled, ready)

lol

/approve

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 11, 2024
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: aojea, pohly

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@aojea
Copy link
Member

aojea commented Oct 11, 2024

should we expect some flakes after this of tests doing wrong assumptions 🤔

@k8s-ci-robot k8s-ci-robot merged commit a0e146a into kubernetes:master Oct 11, 2024
15 checks passed
@k8s-ci-robot k8s-ci-robot added this to the v1.32 milestone Oct 11, 2024
@pohly
Copy link
Contributor Author

pohly commented Oct 11, 2024

I don't think so. The tests that call this seem to use it as additional sanity check after waiting for the pod(s) to run, in which case all three values should be equal.

@aojea
Copy link
Member

aojea commented Oct 14, 2024

I don't think so. The tests that call this seem to use it as additional sanity check after waiting for the pod(s) to run, in which case all three values should be equal.

it seems it happen :) #128049

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/e2e-test-framework Issues or PRs related to refactoring the kubernetes e2e test framework area/test cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. release-note-none Denotes a PR that doesn't merit a release note. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants