Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not allow empty topology key for pod affinities. #49976

Conversation

aveshagarwal
Copy link
Member

@aveshagarwal aveshagarwal commented Aug 1, 2017

What this PR does / why we need it:
This PR do not allow empty topology key for all 4 pod affinities.

Which issue this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close that issue when PR gets merged): fixes #

Special notes for your reviewer:

Release note:

Affinity in annotations alpha feature is no longer supported in 1.8. Anyone upgrading from 1.7 with AffinityInAnnotation feature enabled must ensure pods (specifically with pod anti-affinity PreferredDuringSchedulingIgnoredDuringExecution) with empty TopologyKey fields must be removed before upgrading to 1.8.

@kubernetes/sig-scheduling-bugs @bsalamat @davidopp

@k8s-ci-robot k8s-ci-robot added sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. kind/bug Categorizes issue or PR as related to a bug. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Aug 1, 2017
@k8s-github-robot k8s-github-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. release-note-label-needed labels Aug 1, 2017
@bsalamat
Copy link
Member

bsalamat commented Aug 1, 2017

Hmmm... I thought it was ok for "PreferredDuringScheduling" to have empty topology key. Empty topology key means "all topologies" for "PreferredDuringScheduling". TopologyKey is not allowed to be empty for
RequiredDuringScheduling.

@liggitt
Copy link
Member

liggitt commented Aug 2, 2017

if you have existing pods like this, tightening validation can prevent them from being able to be deleted

@davidopp
Copy link
Member

davidopp commented Aug 2, 2017

We need to do some more research on this. I'll try to get @wojtek-t to chime in. I can't remember the detailed history of this and unfortunately don't have time right now to dig through it.

The API comment supports what @bsalamat said, and moreover it is also what I recall as being the desired behavior. Also there is a comment in the code that supports it.

But then the actual implementation doesn't seem to do the right thing for empty TopologyKey.

@wojtek-t
Copy link
Member

wojtek-t commented Aug 2, 2017

IIRC, what we wanted to do is to:

  • make it still possible to have empty topology key for alpha version (represented in annotations)
  • disable it in Beta (represented in object fields).
    It makes impossible to migrate your alpha pods to Beta ones, but I think that was a conscious decision.

Empty topology key (or to be more specific, allowing multiple keys, which it is behond the scenes) is a disaster from performance point of view and we should get rid of it.

@davidopp
Copy link
Member

davidopp commented Aug 2, 2017

Thanks, @wojtek-t ! In that case it sounds like this PR is correct, an also we should update the two comments I mentioned in my last reply, that talk about empty TopologyKey.

@aveshagarwal aveshagarwal force-pushed the master-pod-affinities-topology-key branch from db935a4 to 0dad8dd Compare August 2, 2017 13:42
@k8s-github-robot k8s-github-robot added the kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API label Aug 2, 2017
@aveshagarwal
Copy link
Member Author

@davidopp updated PTAL.

@liggitt This behavior is already in 1.7 (https://github.com/kubernetes/kubernetes/blob/release-1.7/pkg/api/validation/validation.go#L2432). This PR fixes 1.8 to avoid regression.

@liggitt
Copy link
Member

liggitt commented Aug 2, 2017

@liggitt This behavior is already in 1.7 (https://github.com/kubernetes/kubernetes/blob/release-1.7/pkg/api/validation/validation.go#L2432). This PR fixes 1.8 to avoid regression.

looks like if they enabled the AffinityInAnnotations feature gate in 1.7, they could have existing pods with empty TopologyKey fields, right? just need to determine whether it was possible for someone to get data like that persisted in 1.7, and make sure adding this validation doesn't make that bad data from 1.7 impossible to remove or fix.

@aveshagarwal
Copy link
Member Author

looks like if they enabled the AffinityInAnnotations feature gate in 1.7, they could have existing pods with empty TopologyKey fields, right? just need to determine whether it was possible for someone to get data like that persisted in 1.7, and make sure adding this validation doesn't make that bad data from 1.7 impossible to remove or fix.

If someone is enabling an Alpha feature (AffinityInAnnotation) in 1.7 and the feature is already deprecated, they should know that they are supposed to update their stuff in 1.7 before moving to 1.8.

I think I am more worried about when AffinityInAnnotation is disabled (which is by default) in 1.7 and then there is behavior change between 1.7 and 1.8 pod affinity beta fields, which this PR fixes.

@liggitt
Copy link
Member

liggitt commented Aug 2, 2017

If someone is enabling an Alpha feature (AffinityInAnnotation) in 1.7 and the feature is already deprecated, they should know that they are supposed to update their stuff in 1.7 before moving to 1.8.

Can you add a release note indicating this is no longer allowed, and that clusters with the alpha AffinityInAnnotation feature enabled must ensure pods with empty TopologyKey fields must be removed before upgrading to 1.8?

@k8s-github-robot k8s-github-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed release-note-label-needed labels Aug 2, 2017
@aveshagarwal
Copy link
Member Author

aveshagarwal commented Aug 2, 2017

@liggitt updated release note, PTAL.

@liggitt
Copy link
Member

liggitt commented Aug 2, 2017

thanks

@timothysc timothysc added this to the v1.8 milestone Aug 2, 2017
@aveshagarwal
Copy link
Member Author

@bsalamat @davidopp @lavalamp could you give lgtm /approval as i dont think anything is left to do in this pr as its been lying for long and it really fixes an important issue?

@mikedanese mikedanese assigned bsalamat and unassigned mikedanese Aug 18, 2017
@bsalamat
Copy link
Member

/lgtm

but I cannot approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 18, 2017
@davidopp
Copy link
Member

/approve

@wojtek-t
Copy link
Member

/approve no-issue

@k8s-github-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: aveshagarwal, bsalamat, davidopp, wojtek-t

Associated issue requirement bypassed by: wojtek-t

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@k8s-github-robot k8s-github-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 21, 2017
@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to @fejta).

Review the full test history for this PR.

1 similar comment
@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to @fejta).

Review the full test history for this PR.

@k8s-github-robot
Copy link

Automatic merge from submit-queue (batch tested with PRs 50531, 50853, 49976, 50939, 50607)

@k8s-github-robot k8s-github-robot merged commit 0f8eaa4 into kubernetes:master Aug 21, 2017
@lavalamp lavalamp added the release-note-action-required Denotes a PR that introduces potentially breaking changes that require user action. label Nov 10, 2017
@lavalamp
Copy link
Member

Sorry, I didn't see this.

Next time please make sure it gets the action required label.

This doesn't look like a safe way to accomplish the goal. It's fine to have action required but there should be a path to fixing things if the action isn't followed correctly. How does one recover in this case?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. release-note-action-required Denotes a PR that introduces potentially breaking changes that require user action. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.