Closed
Description
When I enabled gang-scheduling, I expect all the TF jobs to be scheduled by "kube-batch", so that all the jobs will have the same scheduling policy.
However, If I submit a TFJob with 1 replicas and specified the schedulerName to "kube-batch", The job stays pending. The cause is that TF-operator is not creating PDB if the replicas is less than 2 for the job:
func (jc *JobController) SyncPdb(job metav1.Object, minAvailableReplicas int32) (*v1beta1.PodDisruptionBudget, error) {
labelJobName := jc.Controller.GetJobNameLabelKey()
// Non-distributed training is not required gang scheduling
if minAvailableReplicas < 2 {
return nil, nil
}
.....
Can we remove this check to make the scheduling policy for all jobs consistent?