-
Notifications
You must be signed in to change notification settings - Fork 40k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
api: add unhealthyPodEvictionPolicy for PDBs #113375
api: add unhealthyPodEvictionPolicy for PDBs #113375
Conversation
ca7bee4
to
2d752e8
Compare
if !utilfeature.DefaultFeatureGate.Enabled(features.PDBPodHealthyPolicy) { | ||
if podHealthyPolicyInUse(oldPDBSpec) { | ||
// prevent updates to the field, so invalid values cannot be set when validation is off | ||
pdbSpec.PodHealthyPolicy = oldPDBSpec.PodHealthyPolicy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need this part in case somebody would do PodRunning
-> InvalidValue
after we disable the feature, since it also turns off validations for podHealthyPolicy
field
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
drop this case, validation runs if the field is non-nil regardless of feature gate
This PR may require API review. If so, when the changes are ready, complete the pre-review checklist and request an API review. Status of requested reviews is tracked in the API Review project. |
/triage accepted |
ecedbb3
to
104f937
Compare
pkg/apis/policy/types.go
Outdated
// This field is alpha-level. The disruption controller uses this field when | ||
// the feature gate PDBPodHealthyPolicy is enabled (disabled by default). | ||
// +optional | ||
PodHealthyPolicy PodHealthyPolicy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have changed the descriptions as compared to the KEP.
I have replaced legacy
with default
. I would argue this is not a legacy
behaviour as it is just a different way of handling a PDB and pods and some people may prefer the original one to achieve the least disruption to their application. App Running pods have a chance to become Ready, without getting disrupted.
Also, wouldn't it be better to have eg. a PodReadyStrict
or Default
policy. So the behaviour would be on par with the other policies? We could set it on admission by default. Only old PDBs that were not updated would be still empty.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that's fair, make sure to update the KEP to match this implementation as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Huh... I wasn't expecting the controller to change at all. I have a hard time seeing a use case for the controller changing to include running-but-not-ready pods in pdb.Status.CurrentHealthy
.
I thought the option we were adding was strictly around the eviction logic, whether to unconditionally allow eviction of unready pods, regardless of pdb.Status.CurrentHealthy
. That would avoid changing the controller at all, and limit the entire functional change to:
// If the pod is not ready, it doesn't count towards healthy and we should not decrement
- if !podutil.IsPodReady(pod) && pdb.Status.CurrentHealthy >= pdb.Status.DesiredHealthy && pdb.Status.DesiredHealthy > 0 {
- updateDeletionOptions = true
- return nil
+ if !podutil.IsPodReady(pod) {
+ if utilfeature.DefaultFeatureGate.Enabled(features.UnreadyEvictionPolicy) {
+ if pdb.Spec.UnreadyEvictionPolicy != nil && *pdb.Spec.UnreadyEvictionPolicy == policy.AlwaysAllow {
+ updateDeletionOptions = true
+ return nil
+ }
+ }
+ if pdb.Status.CurrentHealthy >= pdb.Status.DesiredHealthy && pdb.Status.DesiredHealthy > 0 {
+ updateDeletionOptions = true
+ return nil
+ }
}
Do you have a pointer to the discussion around the decision to shift the design from narrowly modifying the eviction behavior to changing how the controller computes healthy pods?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc folks on the KEP reviews: @wojtek-t, @ravisantoshgudimetla, in addition to @soltysh
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@soltysh, @atiratree and I synced offline
The status.currentHealthy
/ status.desiredHealthy
/ status.disruptionsAllowed
fields reflect ready pods today. Redefining those fields to be something other than ready pods seems problematic. Also, none of us could come up with a clear use case for considering running-but-not-ready pods as healthy.
The motivation for this new field and the KEP was to let PDB authors define policies that would not block eviction of unready pods, regardless of the count of healthy pods (to fix things like #72320)
We agreed to scope this change down to a field targeting specifically that use case, and to pair it with a narrow change just to the eviction logic for unready pods.
The proposed field is:
type UnreadyPodEvictionPolicyType string
// unreadyPodEvictionPolicy determines how pods which are running (status.phase="Running")
// but not yet ready (no status.conditions item with type=Ready,status=True) are handled by the eviction endpoint.
unreadyPodEvictionPolicy *UnreadyPodEvictionPolicyType `json:"...`
The allowed values would be:
"IfHealthyBudget"
- allow ifpdb.Status.CurrentHealthy >= pdb.Status.DesiredHealthy && pdb.Status.DesiredHealthy > 0
- current behavior atkubernetes/pkg/registry/core/pod/storage/eviction.go
Lines 229 to 230 in 81bd249
// If the pod is not ready, it doesn't count towards healthy and we should not decrement if !podutil.IsPodReady(pod) && pdb.Status.CurrentHealthy >= pdb.Status.DesiredHealthy && pdb.Status.DesiredHealthy > 0 { "AlwaysAllow"
- always allow evicting a running-but-unready podnull
- default, same behavior as"IfHealthyBudget"
(I don't love the IfHealthyBudget
name... that's sort of a placeholder so @atiratree can get started tweaking this PR... if @ravisantoshgudimetla or @wojtek-t or others have a better way to describe the current behavior, we're totally open to it)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have synced with @ravisantoshgudimetla, and it seems it would be better to change the name of the field from UnreadyPodEvictionPolicy
-> UnhealthyPodEvictionPolicy
, to cover future changes to the healthy definition. Users might want to define themselves when their pod becomes healthy. We still will have a binary way of deciding what pod to evict and what not, based on the "healthiness". So I think the current names IfHealthyBudget
and AlwaysAllow
are still fitting. Also, users are used to this naming from the PDB status.
Can I update the name and make the document it in the types in a better way to make it a bit more future proof?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we would have a hard time redefining "healthy" in the future to mean something else.
That said, I agree "unhealthy" is consistent with PDB status fields and the "IfHealthyBudget" policy, and the controller currently determines healthy as "ready and not being deleted", which matches what we want to allow here, so using that term in this field name sounds reasonable, even if we never change the definition of unhealthy in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we decide to change healthy definition for PDB it will require much more thorough changes given the current naming in the API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we discussed that it should be probably based on pod readiness gates and we would need to figure out in a new KEP how to make it configurable and backwards compatible with the current solution. But it should be possible with the current changes.
11b6d60
to
88a8b0d
Compare
/label api-review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/approve
from sig-apps pov
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
API mechanics lgtm, some doc/comment updates needed
// should be considered for eviction. Current implementation considers healthy pods, | ||
// as pods that have status.conditions item with type=Ready,status=True. | ||
// If no policy is specified, the default behavior will be used, | ||
// which corresponds to the IfHealthyBudget policy. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as a follow-up before beta, consider whether we want to leave the default implicit ("this is how we will handle null") or make it explicit with API defaulting (so a null field would actually get populated with IfHealthyBudget
). We don't have to decide that now, since the defaulting would be feature-gated, so null values would still have to be documented/handled, but include this in the list of questions to resolve before beta.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
noted, will add to the KEP update
6c5f932
to
275315c
Compare
flaky |
275315c
to
2fbbc47
Compare
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: atiratree, liggitt, soltysh The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
unrelated |
same ^ |
What type of PR is this?
This PR implements UnhealthyPodEvictionPolicy for PodDisruptionBudgets (originally PodHealthyPolicy) as partially described in kubernetes/enhancements#3017. The additional changes were discussed in this PR.
/kind feature
/kind api-change
/sig apps
What this PR does / why we need it:
This allows users to specify how PodDisruptionBudget and the eviction API should handle pods that are Running but not Ready.
Which issue(s) this PR fixes:
Fixes #
Special notes for your reviewer:
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: