-
Notifications
You must be signed in to change notification settings - Fork 40.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move kubectl wait to informers with a cache to avoid hanging due to objects disappearing from the cluster #110923
Move kubectl wait to informers with a cache to avoid hanging due to objects disappearing from the cluster #110923
Conversation
/triage accepted |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This ensures the comment from #108086 (comment) is addressed and adds tests ensuring that we don't break it in the future.
/lgtm
/approve
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/priority backlog
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: mpuckett159, soltysh The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
wait unit tests have been significantly flakier since this merged: https://storage.googleapis.com/k8s-triage/index.html?pr=1&test=kubectl%2Fpkg%2Fcmd%2Fwait |
This also correlates with the start of #111111 (test times out because |
Note for myself for fixing, it looks like the delete code doesn't set the timeout value properly in the waitOptions, using 0 for "forever" when it should be using it for "check once and report immediately." @aleksandra-malinowska this may point to an underlying issue with the testing, however. Could you point me to the specific test code so I can check to see what specifically is being run to cause these hangs? It sounds like the resource is not being deleted as one would expect, and in combination with the timeout issue is causing this hanging to occur. If the test just does |
What type of PR is this?
/kind bug
What this PR does / why we need it:
Copied from #108086 as this PR attempts to address some regressions that were caused by this PR and add tests to ensure the regression is not done inadvertently again.
This moves the
kubectl wait
set of functions to using informers with cache updates for waiting on resources to reach a specified state. It prevents wait from hanging due to resources disappearing and outputs a descriptive error message when a resource it is waiting on disappears.Example output:
Which issue(s) this PR fixes:
Fixes kubernetes/kubectl#1120
Special notes for your reviewer:
I had to bump all the timeouts in the tests up to at least 1 second due to how the informer and caching works. From what I can tell on my local testing it doesn't actually increase testing time that significantly (2 seconds for me) but just fyi.
Does this PR introduce a user-facing change?
cleared release note since this is reverted in #110922, original release note was: