KEP-4563: EvictionRequest API (fka Evacuation) #4565

atiratree · 2024-03-28T15:05:26Z

One-line PR description: introduce EvictionRequest API to allow managed graceful pod removal

Issue link: EvictionRequest API #4563

Other comments: part of the functionality has been split from KEP-4212: Declarative Node Maintenance #4213 into this KEP

sftim · 2024-03-28T16:12:00Z

keps/sig-apps/4563-evacuation-api/README.md

+We will introduce a new term called evacuation. This is a contract between the evacuation instigator,
+the evacuee, and the evacuator. The contract is enforced by the API and an evacuation controller.
+We can think of evacuation as a managed and safer alternative to eviction.


Watch out for the risk of confusing end users.

We already have preemption and eviction and people confuse the two. Or three, because there are two kinds of eviction. And there's disruption in the mix.

Do we want to rename Scheduling, Preemption and Eviction to Scheduling, Preemption, Evacuation and Eviction?

Good idea, I added a mention which of what kind of eviction I mean here.

Do we want to rename Scheduling, Preemption and Eviction to Scheduling, Preemption, Evacuation and Eviction?

Yes, I think we want to add a new concept there and generally update the docs once we have an alpha.

I'm with Tim here.

Preemption vs Eviction is already quite confusing. And TBH, I couldn't fully understand what the "evacuation" is supposed to solve by reading the summary or motivation.

From Goals:

Changes to the eviction API to support the evacuation process.

If this is already going to be part of the Eviction API, maybe it should be named as a form of eviction. Something like "cooperative eviction" or "eviction with ack" or something along those lines?

I'm all for framing it as another type of eviction; we already have two, so the extra cognitive load for users is not so much a problem.

@alculquicondor I have updated the summary and goals, I hope it makes more sense now.

I think the name should make the most sense to the person creating the Evacuation (Evacuation Instigator ATM). So CooperativeEviction or EvictionWithAck is a bit misleading IMO. Because from that person's perpective there is no additional step required of them. Only the evacuators and the evacuation controller implement the cooperative evacuation process but this is hidden from the normal user.

My suggestions:

GracefulEviction (might confuse people if it is associated with graceful pod termination, which it is not)

SafeEviction (*safer than the API-initiated one for some pods)

Or just call it Eviction? And tell people to use it instead of the Eviction API endpoint? This might be a bit confusing (at least in the beginning)

We can bikeshed names for the API kind; I'd throw a few of my own into the hat:

EvictionRequest

PodEvictionRequest

I have renamed the API to EvictionRequest to make the term recognizable. A minor disadvantage is that we have to clarify what type of eviction we mean if we say evict (API-initiate eviction, or EvictionRequest)

The rest of the renames are as follows:

Evacuation (noun) -> EvictionRequest / Eviction Process evacuation (verb) -> request an eviction / terminate / evict / process eviction Evacuator -> Interceptor Evacuee -> Pod Evacuator Class -> Interceptor Class Evacuation Instigator -> Eviction Requester Evacuation Controller -> Eviction Request Controller ActiveEvacuatorClass -> ActiveInterceptorClass ActiveEvacuatorCompleted -> ActiveInterceptorCompleted EvacuationProgressTimestamp -> ProgressTimestamp ExpectedEvacuationFinishTime -> ExpectedInterceptorFinishTime EvacuationCancellationPolicy -> EvictionRequestCancellationPolicy FailedEvictionCounter -> FailedAPIEvictionCounter

sftim · 2024-03-28T16:17:34Z

keps/sig-apps/4563-evacuation-api/README.md

+<!--
+What other approaches did you consider, and why did you rule them out? These do
+not need to be as detailed as the proposal, but should include enough
+information to express the idea and why it was not acceptable.
+-->


We have a number of examples of having a SomethingRequest or SomethingClaim API that then causes a something (certificate signing, node provisioning, etc).
Think of TokenRequest (a subresource for a ServiceAccount), or CertificateSigningRequest.

I would confidently and strongly prefer to have an EvictionRequest or PodEvictionRequest API, rather than an Evacuation API kind.

It's easy to teach that we have evictions and than an EvictionRequest is asking for one to happen; it's hard to teach the difference between an eviction and an evacuation.

As a side effect, this makes the feature gate easier to name (eg PodEvictionRequests).

As you have mentioned, we already have different kinds of eviction. So I think it would be good to use a completely new term to distinguish it from the others.

Also, Evacuation does not always result in eviction (and PDB consultation). It depends on the controller/workload. For some workloads like DaemonSets and static pods, API eviction has never worked before. This could also be very confusing if we name it the same way.

I think Evacuation fits this better because

The name is shorter. If we go with EvacuationRequest then the evacuation will become just an abstract term and less recognizable.

It seems it will have quite a lot of functionality included (state synchronization between multiple instigators and multiple evacuators, state of the evacuee and evacuation). TokenRequest and CertificateSigningRequest are simpler and not involved in a complex process.

I suggested EvictionRequest so that we don't have to have a section with the (too long) title: Scheduling, Preemption, Evacuation and Eviction. Not EvacuationRequest.

Adding another term doesn't scale so well: it means helping n people understand the difference between evacuation and eviction. It's a scaling challenge where n is not only large, it probably includes quite a few Kubernetes maintainers.

As for CertificateSigningRequest being simple: I don't buy it. There are three controllers, custom signers, an integration with trust bundles, the horrors of ASN.1 and X.509… trust me, it's complicated enough.

I understand that it will be confusing for people, but that will happen regardless of what term we will use.

My main issue is that evacuation does not directly translate to eviction. Therefore, I think it would be preferable to choose a new term (not necessarily evacuation).

I would like to get additional opinions from people about this. And we will definitely have to come back to this in the API review.

Should be resolved now: #4565 (comment)

keps/sig-apps/4563-evacuation-api/README.md

atiratree · 2024-04-09T17:15:51Z

I have updated the KEP to include support for multiple evacuators
Evacuators can now advertise which pods they are able to evacuate, even before the evacuation. The advantage of this approach is that we can trigger an eviction immediately without a delay (was known asacceptDeadlineSeconds) if we do not find an evacuator. I have added a bunch of restrictions to ensure the API cannot be misused.
Clarified how the Evacuation objects should be named and how the admission should work in general (also for pods). This will ensure a 1:1 mapping between pods and Evacuation.
Removed the ability to add a full reference of the evacuator because it would be a hassle to synchronize considering the evacuator leader election and multiple evacuatorsin play.

sftim · 2024-04-09T18:21:41Z

keps/sig-apps/4563-evacuation-api/README.md

+
+Example evacuation triggers:
+- Node maintenance controller: node maintenance triggered by an admin.
+- Descheduler: descheduling triggered by a descheduling rule.


If the descheduler requests an eviction, what thing is being evacuated?

(the node maintenance and cluster autoscaler examples are easier: you're evacuating an entire node)

A single pod or multiple pods. The descheduler can use it as a new mechanism instead of eviction.

OK, so people typically think of “evacuate” as a near synonym of “drain” - you drain a node, you evacuate a rack or zone full of servers. Saying that you can evacuate a Pod might make people think its containers all get stopped, or just confuse readers. We do need to smooth out how we can teach this.

It seems it can be used in both scenarios https://www.merriam-webster.com/grammar/can-you-evacuate-people.
Evacuation of containers doesn't make sense because they are tied to the pod lifecycle. But, I guess it could be confusing if we do not make it explicitly clear what we are targeting.

Thing is, Kubernetes users typically - before we accept this KEP - use “evacuate” as a synonym for drain.

I'm (still) fine with the API idea, and still concerned about the naming.

Just to +1 the potential confusion of the term "evacuation".
Is it okay to have a glossary of terms for "evacuation", "eviction", and "drain" (or any other potentially confusing terms) added somewhere in this KEP?

I can include it in the KEP. And yes, we are going to change the name to something more suitable.

"To evacuate a person" implies "get them out of trouble, to safety" as opposed to "to empty" (as in vacuum). It's not ENTRIELY wrong in this context, but it's not entirely right either.

I will change the name to EvictionRequest. It was originally chosen to distinguish it from eviction, but there is value in making it familiar to existing concepts.

The API is renamed to EvictionRequest now, see #4565 (comment) for more details

keps/sig-apps/4563-evacuation-api/README.md

atiratree

@thockin Thanks a lot for the feedback. I see there are still quite a lot of places to improve.

Can we pin down some very concrete examples of what REAL evacuators might really do, and how the lifecycle works, especially WRT cancellation of an eviction?

I guess the user stories are still pretty abstract and it is best to show the flow with real examples.

I will work on the updates soon.

atiratree · 2024-09-30T19:46:56Z

keps/sig-apps/4563-evacuation-api/README.md

+
+Example evacuation triggers:
+- Node maintenance controller: node maintenance triggered by an admin.
+- Descheduler: descheduling triggered by a descheduling rule.


I will change the name to EvictionRequest. It was originally chosen to distinguish it from eviction, but there is value in making it familiar to existing concepts.

atiratree · 2024-09-30T19:50:20Z

keps/sig-apps/4563-evacuation-api/README.md

+
+The major issues are:
+
+1. Without extra manual effort, an application running with a single replica has to settle for


Unfortunately, we made it possible and people do it. And also utilize validation webhooks/controllers to implement their desired behavior.

atiratree · 2024-09-30T19:55:18Z

keps/sig-apps/4563-evacuation-api/README.md

+   - After a partial cleanup (e.g. storage migrated, notification sent) and if the application is
+     still in an available state, the eviction API can be used to respect PodDisruptionBudgets.
+
+In the end, the evacuation should always end with a pod being deleted (evict or delete call) by one


Right, we should make it clear that the pod might not end up deleted/terminated. I also want to add a mechanism that would result in the pod termination, but not deletion.

atiratree · 2024-09-30T20:20:19Z

keps/sig-apps/4563-evacuation-api/README.md

+There can be many evacuation instigators for a single Evacuation.
+
+When an instigator decides that a pod needs to be evacuated, it should create an Evacuation:
+- With a predictable and consistent name in the following format: `${POD_UID}-${POD_NAME_PREFIX}`.


Name of the Evacuation object must be unique and predictable for each pod to prevent the creation of multiple Evacuations for the same pod. We do not expect the evacuators to support interaction with multiple Evacuations for a pod.

I was trying to combine ease of use (POD_NAME) and uniqueness of each pod instance (POD_UID).

Having name only: we can solve the uniqueness by an admission plugin and checking .spec.podRef.

Having ID only: we can let clients (kubectl) print out the name from .spec.podRef.

Even though we do not expect a big number of instigators, I agree that ${POD_UID}-${POD_NAME_PREFIX} is overly complicated for the end user. Especially the number of characters limit.

I (maybe wrongly) presume there's a cleanup loop that removes evacuations which have no matching pod?

Yes there is a GC, but that can have a delay. Nevertheless I am mainly concerned about having the 1:1 mapping, to make the implementation easier for all the parties (mainly Evacuators)

atiratree · 2024-09-30T20:22:11Z

keps/sig-apps/4563-evacuation-api/README.md

+  after the previous pod is removed.
+- `.spec.progressDeadlineSeconds` should be set to a reasonable value. It is recommended to leave
+  it at the default value of 1800 (30m) to allow for potential evacuator disruptions.
+- `.spec.evacuators` value will be resolved on the Evacuation admission from the pod and should


I will go over the whole KEP and try to make it more accessible.

atiratree · 2024-09-30T20:30:56Z

keps/sig-apps/4563-evacuation-api/README.md

+  characters (`63 - len("priority_")`)
+- `PRIORITY` and `ROLE`
+  - `controller` should always set a `PRIORITY=10000` and `ROLE=controller`
+  - other evacuators should set `PRIORITY` according to their own needs (minimum value is 0,


0 is "lowest priority"

What does this priority govern?

The order of Evacuator evaluation. The highest priority evacuators are evaluated first.

Can two of them have the same prio?

I think they can, but then their order is not defined in the API and most likely just order alpabetically by their class.

One exception is 10000/controller which should be unique and set by the controller.

keps/sig-apps/4563-evacuation-api/README.md

atiratree · 2024-09-30T20:54:21Z

keps/sig-apps/4563-evacuation-api/README.md

+It should start the evacuation only if it observes `.status.activeEvacuatorClass` that matches the
+`EVACUATOR_CLASS` it previously set in the annotation.
+
+If the evacuator is not interested in evacuating the pod, it should set


Elsewhere you talked about an evacuation not being necessary and being cancelled.

I am sorry, I will make this part more clear.

Can it be cancelled once evacuators have started running? What if they have done something irreversible? Are they all expected to be reversible? Idempotent? Do they get called to reverse an evacuation?

A @sftim has said, one of the consumers/inspirations is the NodeMaintenance or other projects that might want to stop the maintenance.

The cancellation should be under the control of the current Evacuator. Evacuators should offer the ability to cancel if possible, but they can forbid it if the termination is irreversible. Please see .status.EvacuationCancellationPolicy.

atiratree · 2024-09-30T21:04:48Z

keps/sig-apps/4563-evacuation-api/README.md

+`.spec.progressDeadlineSeconds` and make sure to update the status periodically in less than that
+duration. For example, if `.spec.progressDeadlineSeconds` is set to the default value of 1800 (30m),
+it may update the status every 3 minutes. The status updates should look as follows:
+- Check that `.status.activeEvacuatorClass` still matches the `EVACUATOR_CLASS` previously set in


.status.activeEvacuatorClass is only reconciled/coordinated by the the evacuation controller (single instance).

The reaction time can be tweaked with .spec.progressDeadlineSeconds.

I think the following values might be reasonable, but welcome any other suggestions.

// The minimum value is 600 (10m) and the maximum value is 21600 (6h). // The default value is 1800 (30m).

FYI, this mechanism is very quick if there is no evacuator advertised on the pod. It will instantly fallback to eviction.

keps/sig-apps/4563-evacuation-api/README.md

tuibeovince · 2024-10-09T00:30:37Z

keps/sig-apps/4563-evacuation-api/README.md

+   needs high availability either, due to a preference for a simpler deployment model, lack of
+   application support for HA, or to minimize compute costs. Also, any automated solution needs
+   to edit the PDB to account for the additional pod that needs to be spun to move the workload
+   from one node to another. This has been discussed in issue [kubernetes/kubernetes#66811](https://github.com/kubernetes/kubernetes/issues/66811)


As far as I have read about EvacuationAPI, it seems to respect PDBs always.
But I wish to know, will EvacuationAPI be able to disable PDBs or ignore PDBs to help the case where the operator expects zero living nodes in the cluster (i.e. shutting down cluster during the weekends/holidays for maintenance and starting them again after the weekend)?
The above case requires all nodes to shut down, and having PDBs with minAvailable: 1 or maxUnavailable: 0 will definitely block the draining the nodes.

the feature is not limited by PDBs, the workloads can use them but they can also bypass them if there is a controller that can make more advanced decisions

But for the k8s built-in controllers like deployments, statefulsets, etc; if a PDB was configured, how will EvacuationAPI be able to bypass the disruption budget and allow node termination below the PDB configured values?

I will update the KEP soon. But the mechanism of optionally bypassing the PDB should not change, and requires action from the evacuator.. PTAL at the KEP or just the attached diagram: https://github.com/kubernetes/enhancements/pull/4565/files#diff-560381a7190e130fded9e8758d53d2e5c4b69c39838485fc7a3fb13be096ae47

looking at the diagram, where in the innermost loop, it says the Evacuator updates the Evacuation status. I just want to clarify the meaning of the part where it says

"make sure the Evacuation status is up to date, evict the pod if it is not".

Does it mean if the evacuator failed to update the Evacuation status for a specific time, the pod will be terminated regardless?

yup, this is implemented via the .spec.progressDeadlineSeconds, but the min/default values should be forgiving to allow for evacuator disruptions

oh thank you. As I see it, does .spec.progressDeadlineSeconds double as the polling duration counting from the most recent evacuator update, waiting for other potential instigators?

No, the .spec.progressDeadlineSeconds is a static value that cannot be changed. Every interceptor/evacuator abides by the same deadline to produce an update tick. By making the value immutable, we can make life easier for the interceptors to not miss the deadline.

constraints

indicate that the pod Evacuation is complete if the pod `restartPolicy` allows it.

- move ${POD_UID}-${POD_NAME_PREFIX} format to alternatives

…tions

- better describe Evacuator responsibilities - add new PDB issues - add a follow-up to the beta graduation to track evacuations in pods

keps/sig-apps/4563-eviction-request-api/kep.yaml

- Update the followups. - Add proposal summary. - Add practical use cases to the proposal summary and to the followups. - Deployment Pod Surge Example - HorizontalPodAutoscaler Pod Surge Example - Descheduling and Downscaling All renames for a reference: Evacuation (noun) -> EvictionRequest / Eviction Process evacuation (verb) -> request an eviction / terminate / evict / process eviction Evacuator -> Interceptor Evacuee -> Pod Evacuator Class -> Interceptor Class Evacuation Instigator -> Eviction Requester Evacuation Controller -> Eviction Request Controller ActiveEvacuatorClass -> ActiveInterceptorClass ActiveEvacuatorCompleted -> ActiveInterceptorCompleted EvacuationProgressTimestamp -> ProgressTimestamp ExpectedEvacuationFinishTime -> ExpectedInterceptorFinishTime EvacuationCancellationPolicy -> EvictionRequestCancellationPolicy FailedEvictionCounter -> FailedAPIEvictionCounter

atiratree · 2024-12-03T20:57:50Z

KEP Update; notable changes:

Evacuation has been renamed to EvictionRequest and Evacuator to Interceptor.
The EvictionRequest is considered complete when the pod terminates (phase Succeeded or Failed). The pod does not need to be deleted if the restartPolicy allows it.
Improved proposal summary + many small improvements.
EvictionRequest name should equal to the pod UID.
Increased constraints for pod interceptor annotations.
Updated issues.

Added practical use cases/example scenarios:

atiratree · 2024-12-03T21:37:54Z

I am out of time for now, but I have to admit I am worried about how this will play out. Coordinating between N arbitrary actors via the API is not somethign we really do anywhere else (AFAIK "classic" DRA was the one place and that is being removed).

Can we pin down some very concrete examples of what REAL evacuators might really do, and how the lifecycle works, especially WRT cancellation of an eviction? You can lose the details and just tell a story. This is significantly complicated and I don't feel able to defend it right now.

@thockin, in the workloads space there are cross-cutting concern issues where such coordination is necessary (PTAL at examples above).

For example, in point 3 of the Motivation section we see the need to coordinate the ReplicaSet controller (most likely also the Deployment controller), the HPA and the descheduler (or similar component) just on the interceptor side. And this does not take into account other custom interceptors that users may want to implement.
In the EvictionRequest Cancellation Examples section, we can also see the need for coordination between eviction requesters.

Could this be positioned like "readiness gates". That sort of feels similar.

Such deletion gates could work in a system with well-meaning and diligent actors, but difficult to achieve in practice:

One of the main goals is not just to help application developers and application-oriented components to implement their use cases. But to also to help cluster administrators to detect misbehaving actors and have a well-defined path to achieve pod termination. So this is why we require a proof of work (status update) to prevent people from using it as just another PDB that blocks cluster upgrades.
As above, it is difficult to achieve coordination between multiple actors through the use of gates.

soltysh

I've read through the initial requirements and assumptions, I haven't read carefully through the Eviction protocol and the contract described in the latter part of this document.

soltysh · 2024-12-09T15:04:35Z

keps/sig-apps/4563-eviction-request-api/README.md

+   to edit the PDB to account for the additional pod that needs to be spun to move the workload
+   from one node to another. This has been discussed in issue [kubernetes/kubernetes#66811](https://github.com/kubernetes/kubernetes/issues/66811)
+   and in issue [kubernetes/kubernetes#114877](https://github.com/kubernetes/kubernetes/issues/114877).
+2. Similar to the first point, it is difficult to use PDBs for applications that can have a variable


This case seems identical with the first one. In both cases you're covering a single-replica application. How it got there (always has been, or was scaled down) doesn't matter, imo. So I'd advice to squash the two together.

HPAs are fun here though because they can heal in ways are otherwise left to either deployment/statefulset controllers. So maybe it should be positioned as a consumer of this api?

Yes, it is similar that it covers scaling of a single replica application, but the impact and followups are different. HPA covers 1-n replicas by default. So I think it is okay to keep it separate.

So maybe it should be positioned as a consumer of this api?

HPA is interesting because it can be both a requester (e.g. during scaling down) and an interceptor (e.g. during node drain). As is shown in the examples in the KEP.

keps/sig-apps/4563-eviction-request-api/README.md

soltysh · 2024-12-09T15:36:01Z

keps/sig-apps/4563-eviction-request-api/README.md

+
+We should discourage the creation of preventive EvictionRequests, so that they do not end up as
+another PDB. So we should design the API appropriately and also not allow behaviors that do not
+conform to the eviction request contract.


You can't enforce it in any way, and the API will get used in various, unpredictable ways (see PDB use cases, which you nicely outlined in the sections above). I understand the sentiment, but I don't think it will help here in any way.

answered in #4565 (comment)

soltysh · 2024-12-09T15:38:13Z

keps/sig-apps/4563-eviction-request-api/README.md

+
+Its responsibility is to observe eviction requests from requesters and periodically check that
+interceptors are making progress in evicting/terminating pods. It is important to see a consistent
+effort by the interceptors to reconcile the progress of the eviction request. This is important to


How are you planning to enforce the progress here? How can you tell that a progress stuck for several minutes is good or bad? Every application will define that differently, so it's hard to define one-size fits all. Even if you require an update every X amount of time, that can be overused by dummy updates, just to block longer/indefinitely.

We do not need to enforce the progress. This is modelled after leader election. If an interceptor fails to update the EvictionRequest status to report the progress we will choose another interceptor or evict the pod.

Leader election works fine with much lower timeouts. I think we can be more forgiving here, but the 10m mark (allowed minimum) should be sufficient.

that can be overused by dummy updates, just to block longer/indefinitely.

The same could be said about the leader election, which could be used to hijack an existing controller.

The idea behind this is that we should make it very unpleasant to block eviction (proof of work instead of just creating a PDB). And if someone implements this, they might as well implement graceful termination.

Another benefit of this approach is that an administrator can take a look at a very long EvictionRequest and see what InterceptorClass is set on it and can easily track down the misbehaving actor. We can also hook into this with metrics/alerts.

Finally, since this is a normal k8s object, we can use RBAC to restrict and observer access to the EvictionRequest.

keps/sig-apps/4563-eviction-request-api/README.md

soltysh · 2024-12-09T16:13:47Z

keps/sig-apps/4563-eviction-request-api/README.md

+
+Tracking the eviction status in a pod would be feasible, but it would be hard to coordinate between
+the actors as mentioned. So it would be better to decouple the eviction request process from the pod
+as well.


Have you considered improving PDB to provide more insightful status about eviction, which would allow external controllers to watch that status and react accordingly (either by scaling the app up, or doing the actions you're describing above)? This way the whole process could be tracked and synchronized outside of the core resources, and k8s would only expose sufficient information to provide that kind of automation.

The main problems are cardinality and synchronization, we have a set of pods guarded by a single PDB and there is a group of actors (requesters and interceptors) interested in the lifecycle of these pods. So tracking the state of the pods in the status would not scale well. An aggregate status would be less useful and would still run into update conflicts.

High API server load: as this status would only be used on eviction, it will have outdated information. Actors interacting with a PDB will want to have the latest information about the pods (e.g. recalculating descheduling rules according to the cpu/mem utilization) and might do frequent status updates, congesting the API server.

Not all applications use PDBs. The EvictionRequest API tries to serve as a replacement for eviction and as the main descheduling primitive. This has use cases in cluster and pod autoscaling. For example when used with HPA, we would like to downscale any application/set of pods.

paulgmiller · 2024-12-10T01:27:52Z

keps/sig-apps/4563-eviction-request-api/README.md

+See some practical use cases for this feature:
+1. Ability to upscale first before terminating the pods with a Deployment: [Deployment Pod Surge Example](#deployment-pod-surge-example)
+   based on the [EvictionRequest Process](#evictionrequest-process).
+2. Ability to upscale first before terminating the pods with HPA: [HorizontalPodAutoscaler Pod Surge Example](#horizontalpodautoscaler-pod-surge-example)


I guess dragging along the pdb to have minAvailable == HPA's current target is a pretty hacky thus the need for a non pdb blocked signal.

Yes, there are a bunch of problems here, user friendliness, atomicity of actions, and responding to any descheduling (e.g. node drain).

txomon · 2024-12-10T11:34:45Z

Not sure if it's relevant to the conversation, however I would expect that after this change, the user scenario 2 is solved without the user doing anything, as the maxSurge parameter and the fact that it's a rollout deployment are already specified in the deployment spec.

atiratree · 2024-12-10T15:33:22Z

Yes, the implementation of the follow-ups is only outlined, here and there should be additional KEPs for each improvement. And the majority of the things proposed should result in an immediate benefit without any user action.

k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory labels Mar 28, 2024

k8s-ci-robot requested review from kow3ns and soltysh March 28, 2024 15:05

k8s-ci-robot added sig/apps Categorizes an issue or PR as relevant to SIG Apps. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Mar 28, 2024

atiratree mentioned this pull request Mar 28, 2024

KEP-4212: Declarative Node Maintenance #4213

Open

6 tasks

sftim reviewed Mar 28, 2024

View reviewed changes

atiratree force-pushed the evacuation-api branch 8 times, most recently from 13611ce to 2d15b79 Compare March 28, 2024 20:48

razo7 reviewed Apr 2, 2024

View reviewed changes

keps/sig-apps/4563-evacuation-api/README.md Outdated Show resolved Hide resolved

atiratree force-pushed the evacuation-api branch from 2d15b79 to c292415 Compare April 9, 2024 17:01

sftim reviewed Apr 9, 2024

View reviewed changes

keps/sig-apps/4563-evacuation-api/README.md Outdated Show resolved Hide resolved

sftim reviewed Apr 9, 2024

View reviewed changes

keps/sig-apps/4563-evacuation-api/README.md Outdated Show resolved Hide resolved

atiratree force-pushed the evacuation-api branch 6 times, most recently from 1900020 to 085672a Compare April 10, 2024 21:47

adammw reviewed Apr 12, 2024

View reviewed changes

keps/sig-apps/4563-evacuation-api/README.md Outdated Show resolved Hide resolved

atiratree force-pushed the evacuation-api branch from 085672a to f81cc8c Compare April 12, 2024 15:32

atiratree commented Sep 30, 2024

View reviewed changes

tuibeovince reviewed Oct 1, 2024

View reviewed changes

keps/sig-apps/4563-evacuation-api/README.md Outdated Show resolved Hide resolved

tuibeovince reviewed Oct 9, 2024

View reviewed changes

KEP-4563: Evacuation API

9b6a5e6

atiratree force-pushed the evacuation-api branch from b833a5c to 24e034e Compare November 29, 2024 15:57

atiratree added 2 commits December 1, 2024 00:44

add down scaling and descheduling use case that complies with scheduling

38f4ecd

constraints

Terminal phase Succeeded or Failed of the pod should be sufficient to

ed637b7

indicate that the pod Evacuation is complete if the pod `restartPolicy` allows it.

sanposhiho mentioned this pull request Dec 1, 2024

PROPOSAL - extend 'scale' subresource API to support pod-deletion-cost kubernetes/kubernetes#123541

Open

atiratree added 3 commits December 1, 2024 04:21

Evacuation name should match a Pod UID

3e236a8

- move ${POD_UID}-${POD_NAME_PREFIX} format to alternatives

better describe and add more features/validations to Evacuator annota…

45b9807

…tions

adress review remarks

c7a3778

- better describe Evacuator responsibilities - add new PDB issues - add a follow-up to the beta graduation to track evacuations in pods

atiratree force-pushed the evacuation-api branch from 24e034e to b6f4d01 Compare December 3, 2024 13:59

sftim reviewed Dec 3, 2024

View reviewed changes

keps/sig-apps/4563-eviction-request-api/kep.yaml Outdated Show resolved Hide resolved

keps/sig-apps/4563-eviction-request-api/kep.yaml Outdated Show resolved Hide resolved

keps/sig-apps/4563-eviction-request-api/kep.yaml Outdated Show resolved Hide resolved

atiratree added 2 commits December 3, 2024 15:14

update Evacuation sequence diagram to EvictionRequest and KEP name

3c33535

atiratree force-pushed the evacuation-api branch from b6f4d01 to 3c33535 Compare December 3, 2024 14:14

update kep.yaml

921e1df

atiratree changed the title ~~KEP-4563: Evacuation API~~ KEP-4563: EvictionRequest API (fka Evacuation API) Dec 3, 2024

atiratree changed the title ~~KEP-4563: EvictionRequest API (fka Evacuation API)~~ KEP-4563: EvictionRequest API (fka Evacuation) Dec 3, 2024

add EvictionRequest Cancellation Examples section + fixes

7772213

atiratree force-pushed the evacuation-api branch from 9df74ad to 7772213 Compare December 3, 2024 20:38

soltysh reviewed Dec 9, 2024

View reviewed changes

This was referenced Dec 9, 2024

New EvictionBlocked PodDisruptionCondition kubernetes/kubernetes#128815

Open

maxSurge for node draining or how to meet availability requirements when draining nodes by adding pods kubernetes/kubernetes#114877

Open

paulgmiller reviewed Dec 10, 2024

View reviewed changes

adress review remarks

d6992ce


		The major issues are:

		1. Without extra manual effort, an application running with a single replica has to settle for

KEP-4563: EvictionRequest API (fka Evacuation) #4565

Are you sure you want to change the base?

KEP-4563: EvictionRequest API (fka Evacuation) #4565

Conversation

atiratree commented Mar 28, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

atiratree May 8, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

atiratree Dec 3, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

atiratree commented Apr 9, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sftim Jun 3, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

atiratree left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

atiratree commented Dec 3, 2024

atiratree commented Dec 3, 2024 • edited Loading

soltysh left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

txomon commented Dec 10, 2024

atiratree commented Dec 10, 2024

atiratree commented Mar 28, 2024 •

edited

Loading

atiratree May 8, 2024 •

edited

Loading

atiratree Dec 3, 2024 •

edited

Loading

sftim Jun 3, 2024 •

edited

Loading

atiratree commented Dec 3, 2024 •

edited

Loading