Scheduler input should be taken when reducing replicas #4301
Description
When the replica count is increased, the new pod(s) are assigned to hosts by the scheduler based on policies defined by the selected predicates/priorities. However, during scale-down (reducing the replica count), the scheduler does not come into the picture potentially resulting in a violation of the scheduling policies. It is irrelevant whether the scale-down is automatic or manual.
Consider the case that there are 5 pods being spread across 3 machines, with 2 pods on each of the two machines and 1 pod on the third machine. If the replica count is reduced to 4, we shouldn't remove the single pod from the third machine (assuming the scheduler policy is to achieve greatest possible spread). To address this, we need to involve the scheduler in the workflow to figure out which pod(s) to get rid of.
Also, see #3948 for some other concerns related to determining which pods are removed when the replica count is reduced.
I do not have a proposed solution for this would like to invite some discussion and feedback on this issue.
Metadata
Assignees
Labels
Type
Projects
Status
Needs Triage
Status
Needs Triage