Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use pod UID as cache key instead of namespace/name #61069

Merged
merged 1 commit into from
Mar 13, 2018

Conversation

anfernee
Copy link
Member

@anfernee anfernee commented Mar 12, 2018

UID uniquely identifies pods across lifecycles, while namespace/name
could be 2 different pods across lifecycles. This could result in
tricky scheduler bugs.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #60966

Special notes for your reviewer: @bsalamat

Release note:

Fix a bug in scheduler cache by using Pod UID as the cache key instead of namespace/name

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Mar 12, 2018
@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Mar 12, 2018
@anfernee anfernee force-pushed the sched-cache-rekey branch from b4160de to f07fefa Compare March 12, 2018 22:56
@anfernee
Copy link
Member Author

/retest

@@ -495,7 +494,7 @@ func (n *NodeInfo) FilterOutPods(pods []*v1.Pod) []*v1.Pod {

// getPodKey returns the string key of a pod.
func getPodKey(pod *v1.Pod) (string, error) {
return clientcache.MetaNamespaceKeyFunc(pod)
return string(pod.UID), nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please add a check to return an error when pod.UID is empty?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure. I thought it is always non-empty? can I ask when it's empty?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It shouldn't be empty, but a bug in the system could leave it empty. It could also be left empty in our tests. So, it would be good to check and return an error.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make sense. just curious.

@bsalamat
Copy link
Member

We should include this in 1.10 as this is a bug fix.

@dims
Copy link
Member

dims commented Mar 13, 2018

/kind bug
/priority important-soon
/sig scheduling

Throwing the labels for @bsalamat

@k8s-ci-robot k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. labels Mar 13, 2018
@dims
Copy link
Member

dims commented Mar 13, 2018

/status approved-for-milestone

@dims
Copy link
Member

dims commented Mar 13, 2018

@bsalamat @jayunit100 All it needs now is a lgtm from one of you

@ravisantoshgudimetla
Copy link
Contributor

Shouldn't we update the filter code as well which checks if the pod exists in the pod list? https://github.com/kubernetes/kubernetes/blob/master/pkg/scheduler/schedulercache/node_info.go#L509

@jdumars
Copy link
Member

jdumars commented Mar 13, 2018

@sig-scheduling-maintainers PTAL ASAP

@bsalamat
Copy link
Member

Shouldn't we update the filter code as well which checks if the pod exists in the pod list?

It would be better to change the filter function as well. It is not necessary for this PR to work correctly though.

@anfernee anfernee force-pushed the sched-cache-rekey branch from f07fefa to 06f52ab Compare March 13, 2018 17:23
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. release-note-none Denotes a PR that doesn't merit a release note. labels Mar 13, 2018
UID uniquely identifies pods across lifecycles, while namespace/name
could be 2 different pods across lifecycles. This could result in
tricky scheduler bugs.

Fixes kubernetes#60966
@anfernee anfernee force-pushed the sched-cache-rekey branch from 06f52ab to 5bad68a Compare March 13, 2018 17:25
@bsalamat
Copy link
Member

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 13, 2018
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: anfernee, bsalamat

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 13, 2018
@k8s-github-robot
Copy link

[MILESTONENOTIFIER] Milestone Pull Request: Up-to-date for process

@anfernee @bsalamat

Pull Request Labels
  • sig/scheduling: Pull Request will be escalated to these SIGs if needed.
  • priority/important-soon: Escalate to the pull request owners and SIG owner; move out of milestone after several unsuccessful escalation attempts.
  • kind/bug: Fixes a bug discovered during the current release.
Help

@anfernee
Copy link
Member Author

/retest

@k8s-github-robot
Copy link

Automatic merge from submit-queue (batch tested with PRs 61111, 61069). If you want to cherry-pick this change to another branch, please follow the instructions here.

@k8s-github-robot k8s-github-robot merged commit 34001d8 into kubernetes:master Mar 13, 2018
@resouer
Copy link
Contributor

resouer commented May 7, 2018

This bug actually exists in previous releases, we need to cherry pick it in 1.9 at least.

@bsalamat
Copy link
Member

@resouer I agree. Could you please create a cherry pick PR for 1.9?

@resouer
Copy link
Contributor

resouer commented May 16, 2018

@bsalamat Sure, will do!

k8s-github-robot pushed a commit that referenced this pull request May 19, 2018
…9-upstream-release-1.9

Automatic merge from submit-queue.

Automated cherry pick of #61069: Use pod UID as cache key instead of namespace/name

Cherry pick of #61069 on release-1.9.

#61069: Use pod UID as cache key instead of namespace/name
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Scheduler should use pod UID instead of "namespace/name" in its cache
8 participants