Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check nodeInfo before ecache predicate #63459

Merged
merged 1 commit into from
May 19, 2018

Conversation

resouer
Copy link
Contributor

@resouer resouer commented May 5, 2018

What this PR does / why we need it:

There's chances during test when nodeInfo is nil which may cause ecache predicate fail with nil pointer.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #63427

Special notes for your reviewer:

Not sure how to reproduce the original issue yet. i.e. why and when nodeInfo will become nil in tests is not clear to me, that's why I label it as WIP.

cc @bsalamat who may have more inputs.

Release note:

NONE

@resouer resouer added the sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. label May 5, 2018
@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels May 5, 2018
@k82cn
Copy link
Member

k82cn commented May 6, 2018

why and when nodeInfo will become nil in tests is not clear to me, that's why I label it as WIP.

The nodeInfo without node was created when node added into cache after pod bind.

@resouer
Copy link
Contributor Author

resouer commented May 6, 2018

@k82cn Yeah, but I am not sure why this happen to TestTaintNodeByCondition. Or it may apply to other tests as well ...

@resouer resouer force-pushed the fix-63427 branch 2 times, most recently from baf2db8 to a133899 Compare May 6, 2018 14:39
@@ -86,6 +87,12 @@ func (ec *EquivalenceCache) RunPredicate(
) (bool, []algorithm.PredicateFailureReason, error) {
ec.mu.Lock()
defer ec.mu.Unlock()

if nodeInfo == nil || nodeInfo.Node() == nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@resouer - do we know why this is nil only sometimes ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some theory, but would like to verify it somehow tomorrow :/

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In some of our tests, nodeInfo.Node() is nil. It is expected.

@k82cn
Copy link
Member

k82cn commented May 11, 2018

/milestone v1.11

@k82cn
Copy link
Member

k82cn commented May 14, 2018

/kind bug
/priority important-soon

@k8s-ci-robot k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. labels May 14, 2018
@ravisantoshgudimetla
Copy link
Contributor

@k82cn Yeah, but I am not sure why this happen to TestTaintNodeByCondition. Or it may apply to other tests as well ...

It can happen to other tests as well.
@resouer - The change seems non-destructive to me and worth pushing forward. Also this makes code much more defensive and less brittle.

Copy link

@misterikkit misterikkit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
this change seems perfectly reasonable to me.


if nodeInfo == nil || nodeInfo.Node() == nil {
// This may happen during tests.
return false, []algorithm.PredicateFailureReason{}, fmt.Errorf("nodeInfo is nil or node is invalid")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should return false, nil, fmt.Errorf(...), but I know that breaks tests today.

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 18, 2018
@resouer resouer changed the title [WIP] Check nodeInfo before ecache predicate Check nodeInfo before ecache predicate May 18, 2018
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 18, 2018
@bsalamat
Copy link
Member

/lgtm

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bsalamat, misterikkit, resouer

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 18, 2018
@resouer
Copy link
Contributor Author

resouer commented May 19, 2018

/test pull-kubernetes-e2e-kops-aws

@k8s-github-robot
Copy link

[MILESTONENOTIFIER] Milestone Pull Request Needs Approval

@bsalamat @misterikkit @resouer @kubernetes/sig-scheduling-misc

Action required: This pull request must have the status/approved-for-milestone label applied by a SIG maintainer. If the label is not applied within 2 days, the pull request will be moved out of the v1.11 milestone.

Pull Request Labels
  • sig/scheduling: Pull Request will be escalated to these SIGs if needed.
  • priority/important-soon: Escalate to the pull request owners and SIG owner; move out of milestone after several unsuccessful escalation attempts.
  • kind/bug: Fixes a bug discovered during the current release.
Help

@k8s-github-robot
Copy link

Automatic merge from submit-queue (batch tested with PRs 63598, 63913, 63459, 63963, 60464). If you want to cherry-pick this change to another branch, please follow the instructions here.

@k8s-github-robot k8s-github-robot merged commit 0a2467d into kubernetes:master May 19, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. milestone/needs-approval priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note-none Denotes a PR that doesn't merit a release note. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Panic during TestTaintNodeByCondition integration test
8 participants