-
Notifications
You must be signed in to change notification settings - Fork 40.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stop StartupProbe explicity when successThrethold is reached #121206
Stop StartupProbe explicity when successThrethold is reached #121206
Conversation
Hi @mochizuki875. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/sig node |
/cc @SergeyKanzhelev |
/ok-to-test |
/triage accepted |
expectContinue(t, w, w.doProbe(ctx), msg) | ||
expectResult(t, w, results.Success, msg) | ||
expectResultRun(t, w, 0, msg) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are there any tests we can do down here to prove the prober stopped and doesn't continue? From what I can see, we only test that it continues before the threshold is met, but don't explicitly test it's stopped
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
In this test case, successThreshold
is set to 1, so
i=0
: the successThreshold
is not met and probe should be executed
i=1
(=successThreshold
): the successThreshold
is met and probe should not be executed
I've added check whether probe will be executed or not in each case using w.onHold
with some comments.
If successThreshold
is exceeded, w.onHold
turn to true and probe will not be executed anymore.
The Kubernetes project currently lacks enough contributors to adequately respond to all PRs. This bot triages PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
/remove-lifecycle stale |
pkg/kubelet/prober/worker.go
Outdated
@@ -316,11 +316,14 @@ func (w *worker) doProbe(ctx context.Context) (keepGoing bool) { | |||
|
|||
w.resultsManager.Set(w.containerID, result, w.pod) | |||
|
|||
if (w.probeType == liveness || w.probeType == startup) && result == results.Failure { | |||
if (w.probeType == liveness && result == results.Failure) || (w.probeType == startup && (result == results.Success || result == results.Failure)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (w.probeType == liveness && result == results.Failure) || (w.probeType == startup && (result == results.Success || result == results.Failure)) { | |
if (w.probeType == liveness && result == results.Failure) || (w.probeType == startup && result != results.Unknown) { |
or
if (w.probeType == liveness && result == results.Failure) || (w.probeType == startup && (result == results.Success || result == results.Failure)) { | |
if (w.probeType == liveness && result == results.Failure) || w.probeType == startup { |
After the w.probeManager.prober.probe()
is executed, will the startup probe still enter the unknown
state?
As long as the startup probe exceeds its FailureThreshold
or SuccessThreshold
, regardless of whether it succeeds or fails, we will stop executing it. Is that correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@HirazawaUi
Thank you for your comment.
After the w.probeManager.prober.probe() is executed, will the startup probe still enter the unknown state?
As I've checked the codebase, w.probeManager.prober.probe()
will return err with unknown
state.
So I think it's enough to check w.probeType == startup
.
As long as the startup probe exceeds its FailureThreshold or SuccessThreshold, regardless of whether it succeeds or fails, we will stop executing it. Is that correct?
As far as I can tell from checking the doc, that seems correct.
Do you have any thoughts on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks good to me now.
/lgtm
bef809f
to
a262c80
Compare
LGTM label has been added. Git tree hash: 0f29de3495b1cc4331ccf56b5e4b89ea5144eb11
|
/assign @matthyx @SergeyKanzhelev |
yes this works |
@@ -316,11 +316,14 @@ func (w *worker) doProbe(ctx context.Context) (keepGoing bool) { | |||
|
|||
w.resultsManager.Set(w.containerID, result, w.pod) | |||
|
|||
if (w.probeType == liveness || w.probeType == startup) && result == results.Failure { | |||
if (w.probeType == liveness && result == results.Failure) || w.probeType == startup { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand this. So for startup probe we will stop after the first run? This condition doesn't check the result to be a failure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, w.onHold = true
will be set when liveness or startup probe is Failure.
However, for asynchronous reason(I've referred to that in this PR description), w.onHold = true
also should also be set when startup probe is Success. Otherwise, startup probe may be executed redundantly.
Similar discussion has been done in:
#121206 (comment)
#121206 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
w.onHold = true
also should also be set when startup probe is Success.
What I see in code is that we set it unconditionally for startup type. Not when it is a startup AND succeeded. What am I missing here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I see in code is that we set it unconditionally for startup type. Not when it is a startup AND succeeded.
Yes, that's right.
So I've said
should also be set when startup probe is Success.
Regarding Startup Probe, w.onHold = true
should be set in each case:
failureThreshold
is reached and last result isFailure
SuccessThreshold
(Must be1
) is reached and last result isSuccess
Threshold check will be done just before this like that, and if it has not been reached, it returns true.
kubernetes/pkg/kubelet/prober/worker.go
Lines 316 to 320 in aa8f287
if (result == results.Failure && w.resultRun < int(w.spec.FailureThreshold)) || | |
(result == results.Success && w.resultRun < int(w.spec.SuccessThreshold)) { | |
// Success or failure is below threshold - leave the probe state unchanged. | |
return true | |
} |
Therefore, at this point, it means that one of the thresholds has been reached, and I think that w.onHold = true
should be set for the Startup Probe regardless of the result.
(Currently, it is only for the case of Failure, but also for Success.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/approve
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: mochizuki875, SergeyKanzhelev The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@mochizuki875: The following test failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
/restart |
/retest |
What type of PR is this?
/kind bug
What this PR does / why we need it:
StartupProbe
is executed while container is starting and will update container status toStarted=true
if it succeeds.When container status becomes to
Started=true
,StartupProbe
will be stopped and Readiness/Liveness probe will be executed.Probe and the main loop of kubelet that updates the container status are executed asynchronously, and the goroutine of the Probe requests the container status update to the main loop of kubelet via the channel.
Due to this asyncronicity,
StartupProbe
may be executed more thansuccessThrethold
.(#117153)In this PR,
StartupProbe
is stopped explicity whensuccessThrethold
is reached.Which issue(s) this PR fixes:
Fixes #117153
Special notes for your reviewer:
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: