Increase scheduling backoff queue max duration and attach specific error message to unschedulable pods #81214
Description
What would you like to be added:
Increase backoff queue max duration and attach specific error message to unschedulable pods
Why is this needed:
Now the scheduling backoff queue max duration is 10 seconds. We find that some pods in our cluster(5K nodes and 10w+ pods) will wait a very long time to be scheduled. These pods are in the active queue with lower priority.
If some higher priority pods can not be scheduled and be added to backoff queue because of many events which trigger MoveAllToActiveQueue
, these higher priority pods will be moved back to active queue in at most 10 seconds, which makes the lower priority pods can not even get a chance to be scheduled, can we increase the backoff queue max duration to relieve this situation ?
And also, some events such as PVC/Service ADD/UPDATE events will blindly move all pods in unschedulable queue to active queue. Can we attach the specific error message when we add pods to unschedulable queue so that events will only move partial pods in unschedulable queue to active queue ?
/assign