Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Moved node condition filter into a predicates. #50362

Merged
merged 2 commits into from
Aug 12, 2017

Conversation

k82cn
Copy link
Member

@k82cn k82cn commented Aug 9, 2017

Which issue this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close that issue when PR gets merged): fixes #50360

Release note:

A new predicates, named 'CheckNodeCondition', was added to replace node condition filter. 'NetworkUnavailable', 'OutOfDisk' and 'NotReady' maybe reported as a reason when failed to schedule pods.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Aug 9, 2017
@k8s-github-robot k8s-github-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. release-note Denotes a PR that will be considered when it comes time to generate release notes. labels Aug 9, 2017
@k82cn
Copy link
Member Author

k82cn commented Aug 9, 2017

/cc @gmarek

@k8s-ci-robot k8s-ci-robot requested a review from gmarek August 9, 2017 08:52
@k82cn
Copy link
Member Author

k82cn commented Aug 9, 2017

If there's a customer configuration, do we need to add 'CheckNodeCondition' by default to keep backward compatibility?

@yastij
Copy link
Member

yastij commented Aug 9, 2017

@k82cn - I think we should, especially that scheduler should behave the same way as before if flags are disabled.

// - NodeReady condition status is ConditionTrue,
// - NodeOutOfDisk condition status is ConditionFalse,
// - NodeNetworkUnavailable condition status is ConditionFalse.
if cond.Type == v1.NodeReady && cond.Status != v1.ConditionTrue {
Copy link
Member

@yastij yastij Aug 9, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find cond.Type == v1.NodeReady && cond.Status == v1.ConditionFalse easier to read WDYT @k82cn ? (same goes for the checks below)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can only schedule pod to ready node, neither False nor Unknown.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup indeed forgot the unknown one

@k82cn k82cn force-pushed the k8s_50360 branch 3 times, most recently from 165b4be to 814cc56 Compare August 9, 2017 13:58
@k82cn
Copy link
Member Author

k82cn commented Aug 9, 2017

/retest

@thockin thockin assigned bsalamat and unassigned thockin Aug 9, 2017
@@ -99,6 +100,16 @@ func RegisterFitPredicate(name string, predicate algorithm.FitPredicate) string
return RegisterFitPredicateFactory(name, func(PluginFactoryArgs) algorithm.FitPredicate { return predicate })
}

// RegisterRequiredFitPredicate registers a fit predicate with the algorithm registry,
// it's always included in configuration. Returns the name with which the predicate was registered.
func RegisterRequiredFitPredicate(name string, predicate algorithm.FitPredicate) string {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like this - it should be a normal predicate, exactly as fit resources and stuff. Is someone want to break their cluster they're free to do so.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is for customer configuration; if the scheduler is configured by customer's conf, they will not include node condition check until updating scheduler conf.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gmarek - Is it too risky to have it ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure about this either. The way this method is written implies that this is a new first class way of adding "required predicates". I would remove this method and requiredFitPredicateMap and would add this predicate to the predicates map hard-codedly in getFitPredicateFunctions with a TODO that this hard-coded predicate is there for backward compatibility and should be removed in a couple of releases.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm okay with these solutions as long as this ensures backward compatibility, @gmarek WDYT ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If someone want to break their cluster they're free to do so.

I think no one want to break their cluster, but they may not know the admission in kubelet. We may also mark the predicates that used by kubelet as required, so maybe a function is better :).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @bsalamat. Also somehow marking predicates used by kubelet sounds like a good thing to do.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bsalamat , does kubelet case make sense to you?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it does, but we should probably choose a different name for this function and the associated map to make it clear why these predicates are treated differently.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to RegistEssentialFitPredicates :).

@@ -771,7 +771,7 @@ func (f *ConfigFactory) getPluginArgs() (*PluginFactoryArgs, error) {
ReplicaSetLister: f.replicaSetLister,
StatefulSetLister: f.statefulSetLister,
// All fit predicates only need to consider schedulable nodes.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove this comment.

}
}

// Ignore nodes that are marked unschedulable
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove this comment.

@@ -1301,6 +1301,38 @@ func CheckNodeDiskPressurePredicate(pod *v1.Pod, meta interface{}, nodeInfo *sch
return true, nil, nil
}

// CheckNodeConditionPredicate checks if a pod can be scheduled on a node
// reporting out of disk, network unavailable and not ready condition.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

optional: Please add to the comment that this predicate doesn't use pod and checks node condition ignoring the pod.

@@ -99,6 +100,16 @@ func RegisterFitPredicate(name string, predicate algorithm.FitPredicate) string
return RegisterFitPredicateFactory(name, func(PluginFactoryArgs) algorithm.FitPredicate { return predicate })
}

// RegisterRequiredFitPredicate registers a fit predicate with the algorithm registry,
// it's always included in configuration. Returns the name with which the predicate was registered.
func RegisterRequiredFitPredicate(name string, predicate algorithm.FitPredicate) string {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure about this either. The way this method is written implies that this is a new first class way of adding "required predicates". I would remove this method and requiredFitPredicateMap and would add this predicate to the predicates map hard-codedly in getFitPredicateFunctions with a TODO that this hard-coded predicate is there for backward compatibility and should be removed in a couple of releases.

@k82cn
Copy link
Member Author

k82cn commented Aug 11, 2017

/retest

@yastij
Copy link
Member

yastij commented Aug 11, 2017

@k82cn flake #50262

@yastij
Copy link
Member

yastij commented Aug 11, 2017

/retest

@yastij
Copy link
Member

yastij commented Aug 11, 2017

@gmarek LGTY ?

@k82cn
Copy link
Member Author

k82cn commented Aug 11, 2017

@gmarek , @bsalamat , is the latest PR OK for you ?

@bsalamat
Copy link
Member

@k82cn
Copy link
Member Author

k82cn commented Aug 12, 2017

/retest

@bsalamat
Copy link
Member

/lgtm

Thanks, @k82cn!

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 12, 2017
@k8s-github-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bsalamat, k82cn

Associated issue: 50360

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to @fejta).

Review the full test history for this PR.

@k8s-github-robot
Copy link

/test all [submit-queue is verifying that this PR is safe to merge]

@k8s-github-robot
Copy link

Automatic merge from submit-queue

@k8s-github-robot k8s-github-robot merged commit 2820b45 into kubernetes:master Aug 12, 2017
@k82cn k82cn deleted the k8s_50360 branch August 12, 2017 23:01
@bsalamat
Copy link
Member

@k82cn Could you please take a look at the broken test?

@k82cn
Copy link
Member Author

k82cn commented Aug 16, 2017

sure :).

@k82cn
Copy link
Member Author

k82cn commented Aug 16, 2017

One interesting thing is that: seems calico is deployed by deployment; the network is unavailable before it's deployed. By default, we'll use DaemonSet for calico; the DaemonSet will tolerate network unavailable to start calico daemons :). Let me check what happened for this case.

I0812 22:53:52.595] Aug 12 22:53:52.595: INFO: At 2017-08-12 22:42:51 +0000 UTC - event for calico-typha: {deployment-controller } ScalingReplicaSet: Scaled up replica set calico-typha-224432132 to 1
calico-typha-224432132-4smcp               Pending       [{Type:PodScheduled Status:False LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2017-08-12 22:42:51 +0000 UTC Reason:Unschedulable Message:No nodes are available that match all of the following predicates: NodeNetworkUnavailable (3), NodeNotReady (3).}]

@k82cn
Copy link
Member Author

k82cn commented Aug 16, 2017

@mindprince , @bsalamat , do you know how to check deploy script for gpu e2e test? I'd like to know why we deploy calico by Deployment.

@bsalamat
Copy link
Member

@k82cn While we could look into the script, I am a bit worried that this change may break similar deployments. Do you know why we see this change of behavior? My understanding is that this PR should not change behavior of the system.

@k82cn
Copy link
Member Author

k82cn commented Aug 16, 2017

@bsalamat , yes, this PR did not change behavior of system; I'd like to check whether other change than this PR for gpu e2e test, my understanding is that we can not deploy calico by Deployment.

@rohitagarwal003
Copy link
Member

This was the only change between the last successful run and the first failure.

@k82cn
Copy link
Member Author

k82cn commented Aug 17, 2017

@mindprince , I means the change to deploy script. I'm not sure where to check it.

@rohitagarwal003
Copy link
Member

There were no changes in test-infra (it stayed at kubernetes/test-infra@d3713c353) between the last successful run and the first failed run.

@k82cn
Copy link
Member Author

k82cn commented Aug 17, 2017

Good, let me check it :).

k8s-github-robot pushed a commit that referenced this pull request Aug 17, 2017
Automatic merge from submit-queue (batch tested with PRs 49342, 50581, 50777)

Update RegisterMandatoryFitPredicate to avoid double register.

**What this PR does / why we need it**:
In #50362 , we introduced `RegisterMandatoryFitPredicate` to make some predicates always included by scheduler. This PRs is to improve it by avoiding double register: `RegisterFitPredicate` and `RegisterMandatoryFitPredicate` 

**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #50360 

**Release note**:

```release-note
None
```
@k82cn k82cn mentioned this pull request Aug 18, 2017
@k82cn
Copy link
Member Author

k82cn commented Aug 18, 2017

#50884 is used for the flaky e2e test.

k82cn pushed a commit to k82cn/kubernetes that referenced this pull request Aug 19, 2017
tjfontaine pushed a commit to oracle/kubernetes that referenced this pull request Aug 21, 2017
Automatic merge from submit-queue

Revert kubernetes#50362.

**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: part of kubernetes#50884

**Release note**:

```release-note
None
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Move node condition filter in scheduler.factory into a new predicates
10 participants