Add setting in a ClusterQueue that an administrator is able to use in order to pause new admissions and have the option to cancel current QuotaReservations and Evict admitted workloads.
This is a common admin journey to control usage from a user.
Add a setting in a ClusterQueue that an administrator is able to use in order to to pause new admissions and have the option to cancel current QuotaReservations and Evict admitted workloads.
Manage the QuotaReservation and Admission of workloads from the same cohort that might borrow resources from the ClusterQueue in question.
Add a new member in the ClusterQueue implementation stopPolicy
the presence of which will mark the ClusterQueue as Inactive and it's value will control how the Admitted
or Reserving
workloads are affected.
As a cluster administrator I want to be able to stop the new admissions in a specific ClusterQueue with the option of Evicting currently admitted Workloads or canceling QuotaReservations.
Managing the Reservation canceling and Eviction of workloads in other queues from the same cohort that are potentially borrowing resources from the stopped queue adds a considerable amount of complexity while having a limited added value, therefore these cases are not covered in this first iteration.
type ClusterQueueSpec struct {
// ....
// stopPolicy - if set the ClusterQueue is considered Inactive, no new reservation being
// made.
//
// Depending on its value, its associated workloads will:
//
// - None - Workloads are admitted
// - HoldAndDrain - Admitted workloads are evicted and Reserving workloads will cancel the reservation.
// - Hold - Admitted workloads will run to completion and Reserving workloads will cancel the reservation.
//
// +kubebuilder:validation:Enum=None;Hold;HoldAndDrain
// +kubebuilder:default="None"
StopPolicy StopPolicy `json:"stopPolicy,omitempty"`
}
type StopPolicy string
const (
None StopPolicy = "None"
Hold StopPolicy = "Hold"
HoldAndDrain StopPolicy = "HoldAndDrain"
)
Once the stopPolicy
is set the cluster queue is marked as inactive with a relevant status message.
If the cluster queue associated to a workload has the stopPolicy
changed depending on the policy value and state of the
workload it should Evict or cancel the reservation of the workload.
[x] I/we understand the owners of the involved components may require updates to existing tests to make this code solid enough prior to committing the changes necessary to implement this enhancement.
To be added depending on the added code complexity.
The controllers/core
suite should check:
- ClusterQueue - Once the
stopPolicy
is set a ClusterQueue becomes Inactive. - Workload - Once its ClusterQueue
stopPolicy
is set, depending on the value:
- The Reserving workloads are canceling the reservation.
- The Admitted workloads get Evicted and the Reserving ones cancel their reservation.
- New workload is not admitted when cluster queue is inactive