Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inter pod topological affinity and anti-affinity implementation #22985

Merged
merged 1 commit into from
May 6, 2016

Conversation

kevin-wangzefeng
Copy link
Member

This PR implements the Phase 1 of inter-pod topological affinity and anti-affinity:

  1. Add the Affinity field to PodSpec and the PodAffinity and PodAntiAffinity types to the API along with all of their descendant types.
  2. Implement a scheduler predicate that takes RequiredDuringSchedulingIgnoredDuringExecution
    affinity and anti-affinity into account. Include a workaround for the issue described at the end of the Affinity section of the Examples section (can't schedule first pod).
  3. Implement a scheduler priority function that takes PreferredDuringSchedulingIgnoredDuringExecution affinity and anti-affinity into account
  4. Implement admission controller that rejects requests that specify "all namespaces" with non-"node" TopologyKey for RequiredDuringScheduling anti-affinity.
    This admission controller should be enabled by default.

P.S. For step 4, this PR currently only implements the admission controller, and "enabling it by default" (changing the deployment scripts) will be in a follow up PR.

As it is the Phase 1, the API change does not contain adding the new field into podSpec, but is in the annotation way which is the same with what we did for the node affinity. A follow up PR to add introductions is also in the plan.

@davidopp @kubernetes/sig-scheduling PTAL.

@googlebot
Copy link

We found a Contributor License Agreement for you (the sender of this pull request), but were unable to find agreements for the commit author(s). If you authored these, maybe you used a different email address in the git commits than was used to sign the CLA (login here to double check)? If these were authored by someone else, then they will need to sign a CLA as well, and confirm that they're okay with these being contributed to Google.

@k8s-github-robot
Copy link

Adding label:do-not-merge because PR changes docs prohibited to auto merge
See http://kubernetes.github.io/docs/editdocs/ for information about editting docs

@k8s-github-robot k8s-github-robot added kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API kind/old-docs size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. do-not-merge DEPRECATED. Indicates that a PR should not merge. Label can only be manually applied/removed. labels Mar 15, 2016
@kevin-wangzefeng
Copy link
Member Author

@googlebot he signed it

@googlebot
Copy link

We found a Contributor License Agreement for you (the sender of this pull request) and all commit authors, but as best as we can tell these commits were authored by someone else. If that's the case, please add them to this pull request and have them confirm that they're okay with these commits being contributed to Google. If we're mistaken and you did author these commits, just reply here to confirm.

@kevin-wangzefeng
Copy link
Member Author

@googlebot confirm

@SrinivasChilveri
Copy link

@googlebot I am okay with these commits.

metadata:
name: with-newlabels
annotations:
scheduler.alpha.kubernetes.io/affinity: >
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we validate that annotations are sane? It seems we would want an analysis tool that could check if your annotation is valid or could ever be met.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is syntactic validation; see elsewhere in this PR (e.g. validation.go)
Knowing whether an annotation could ever be met is probably not equivalent to the halting problem but may be close. Feel free to file an issue but I think it's outside the scope of a first version of this feature.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The validation verifies that the syntax is ok, not that it can be met. I'll open an issue around an analysis tool.

@davidopp davidopp assigned davidopp and unassigned erictune Mar 15, 2016
@k8s-github-robot k8s-github-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 15, 2016
@SrinivasChilveri
Copy link

@googlebot confirm

@kevin-wangzefeng
Copy link
Member Author

@googlebot he signed it

// system will try to eventually evict the pod from its node.
// When there are multiple elements, the lists of nodes corresponding to each
// podAffinityTerm are intersected, i.e. all terms must be satisfied.
// RequiredDuringSchedulingRequiredDuringExecution []PodAffinityTerm `json:"requiredDuringSchedulingRequiredDuringExecution,omitempty"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know without eviction policy and rescheduling we can't do this part yet, but I fail to see the "practical" use case of IgnoredDuringExecution vs. RequiredDuringExecution.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@timothysc Do you mean you think one of the two variants (requiredDuringSchedulingRequiredDuringExecution or requiredDuringSchedulingIgnoredDuringExecution) is not useful? If so, which one do you think is not useful?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see a practical reason for requiredDuringSchedulingIgnoredDuringExecution.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this point the main practical reason is so we can roll out this feature before we have time to implement the "RequiredDuringExecution" part (in particular the logic for eviction). I actually tend to agree with you that once we add eviction (i.e. offer requiredDuringSchedulingRequiredDuringExecution), few people will use requiredDuringSchedulingIgnoredDuringExecution. That said, the current semantics of podSpec.nodeSelector that has been in the system from day one are "ignoredDuringExecution" so I guess that general semantic is not completely useless. Also, some people have argued we actually want two flavors of "requiredDuringExecution" (one that is synchronous eviction and one that is asynchronous eviction) so we'd potentially end up with multiple fields (or some alternative way to distinguish which option you want) even if we didn't have ignoredDuringExecution.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should comment that then.

@SrinivasChilveri
Copy link

@googlebot I am confirming that, okay with these commits being contributed to Google.

// system will try to eventually evict the pod from its node.
// When there are multiple elements, the lists of nodes corresponding to each
// podAffinityTerm are intersected, i.e. all terms must be satisfied.
// RequiredDuringSchedulingRequiredDuringExecution []PodAffinityTerm `json:"requiredDuringSchedulingRequiredDuringExecution,omitempty"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment.

@davidopp
Copy link
Member

davidopp commented May 4, 2016

Test failure is because verify-generate-docs.sh is failing. @kevin-wangzefeng can you rebase and run hack/update-generated-docs.sh (or hack/update-all.sh) ? Here's the error:

Verifying ./hack/../hack/verify-generated-docs.sh
Go version: go version go1.6.2 linux/amd64
+++ [0504 02:06:36] Building go targets for linux/amd64:
    cmd/gendocs
    cmd/genkubedocs
    cmd/genman
    cmd/genyaml
    cmd/genbashcomp
    cmd/mungedocs
+++ [0504 02:07:00] Placing binaries
--- /tmp/tmp.OJsjbZIy8P/docs/admin/kube-scheduler.md    2016-05-04 02:07:02.483487176 -0700
+++ /go/src/k8s.io/kubernetes/docs/admin/kube-scheduler.md  2016-05-04 01:57:01.747580618 -0700
@@ -1,3 +1,37 @@
+<!-- BEGIN MUNGE: UNVERSIONED_WARNING -->
+
+<!-- BEGIN STRIP_FOR_RELEASE -->
+
+<img  src="http://kubernetes.io/img/warning.png" alt="WARNING"
+     width="25" height="25">
+<img  src="http://kubernetes.io/img/warning.png" alt="WARNING"
+     width="25" height="25">
+<img  src="http://kubernetes.io/img/warning.png" alt="WARNING"
+     width="25" height="25">
+<img  src="http://kubernetes.io/img/warning.png" alt="WARNING"
+     width="25" height="25">
+<img  src="http://kubernetes.io/img/warning.png" alt="WARNING"
+     width="25" height="25">
+
+<h2>PLEASE NOTE: This document applies to the HEAD of the source tree</h2>
+
+If you are using a released version of Kubernetes, you should
+refer to the docs that go with that version.
+
+<!-- TAG RELEASE_LINK, added by the munger automatically -->
+<strong>
+The latest release of this document can be found
+[here](http://releases.k8s.io/release-1.2/docs/admin/kube-scheduler.md).
+
+Documentation for other releases can be found at
+[releases.k8s.io](http://releases.k8s.io).
+</strong>
+--
+
+<!-- END STRIP_FOR_RELEASE -->
+
+<!-- END MUNGE: UNVERSIONED_WARNING -->
+
 ## kube-scheduler


@@ -22,9 +56,7 @@ kube-scheduler
   --address="0.0.0.0": The IP address to serve on (set to 0.0.0.0 for all interfaces)
   --algorithm-provider="DefaultProvider": The scheduling algorithm provider to use, one of: DefaultProvider
  •  --failure-domains="kubernetes.io/hostname,failure-domain.beta.kubernetes.io/zone,failure-domain.beta.kubernetes.io/region": Indicate the "all topologies" set for an empty topologyKey when it's used for PreferredDuringScheduling pod anti-affinity.
    

    --google-json-key="": The Google Cloud Platform Service Account JSON Key to use for authentication.

  •  --hard-pod-affinity-symmetric-weight=1: RequiredDuringScheduling affinity is not symmetric, but there is an implicit PreferredDuringScheduling affinity rule corresponding to every RequiredDuringScheduling affinity rule. --hard-pod-affinity-symmetric-weight represents the weight of implicit PreferredDuringScheduling affinity rule.
    

    --kube-api-burst=100: Burst to use while talking with kubernetes apiserver
    --kube-api-content-type="": ContentType of requests sent to apiserver. Passing application/vnd.kubernetes.protobuf is an experimental feature now.
    --kube-api-qps=50: QPS to use while talking with kubernetes apiserver
    @@ -41,4 +73,9 @@ kube-scheduler
    --scheduler-name="default-scheduler": Name of the scheduler, used to select which pods will be processed by this scheduler, based on pod's annotation with key 'scheduler.alpha.kubernetes.io/name'

    
    -###### Auto generated by spf13/cobra on 4-May-2016
    +###### Auto generated by spf13/cobra on 21-Apr-2016
    +
    +
    +<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
    +[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/admin/kube-scheduler.md?pixel)]()
    +<!-- END MUNGE: GENERATED_ANALYTICS -->
    Generated docs are out of date. Please run hack/update-generated-docs.sh
    !!! Error in ./hack/../hack/verify-generated-docs.sh:28
    '"${KUBE_ROOT}/hack/after-build/verify-generated-docs.sh" "$@"' exited with status 1
    Call stack:
    1: ./hack/../hack/verify-generated-docs.sh:28 main(...)
    Exiting with status 1
    FAILED   ./hack/../hack/verify-generated-docs.sh    30s
    

@davidopp
Copy link
Member

davidopp commented May 5, 2016

Actually you need to run hack/update-all.sh after rebasing, since rebasing may change other things.

@kevin-wangzefeng
Copy link
Member Author

hack/verify-generated-docs.sh failed because there are two new flags --failure-domains and --hard-pod-affinity-symmetric-weight. To fix this failure need to update docs/admin/kube-scheduler.md, like this:

diff --git a/docs/admin/kube-scheduler.md b/docs/admin/kube-scheduler.md
index c11f801..892a8a1 100644
--- a/docs/admin/kube-scheduler.md
+++ b/docs/admin/kube-scheduler.md
@@ -56,7 +56,9 @@ kube-scheduler
``
       --address="0.0.0.0": The IP address to serve on (set to 0.0.0.0 for all interfaces)
       --algorithm-provider="DefaultProvider": The scheduling algorithm provider to use, one of: DefaultProvider
+      --failure-domains="kubernetes.io/hostname,failure-domain.beta.kubernetes.io/zone,failure-domain.beta.kubernetes.io/region": Indicate the "all topologies" set for an empty topologyKey when it's used for PreferredDuringScheduling pod anti-affinity.
       --google-json-key="": The Google Cloud Platform Service Account JSON Key to use for authentication.
+      --hard-pod-affinity-symmetric-weight=1: RequiredDuringScheduling affinity is not symmetric, but there is an implicit PreferredDuringScheduling affinity rule corresponding to every RequiredDuringScheduling affinity rule. --hard-pod-affinity-symmetric-weight represents the weight of implicit PreferredDuringScheduling affinity rule.
       --kube-api-burst=100: Burst to use while talking with kubernetes apiserver
       --kube-api-content-type="": ContentType of requests sent to apiserver. Passing application/vnd.kubernetes.protobuf is an experimental feature now.
       --kube-api-qps=50: QPS to use while talking with kubernetes apiserver
@@ -73,7 +75,7 @@ kube-scheduler
       --scheduler-name="default-scheduler": Name of the scheduler, used to select which pods will be processed by this scheduler, based on pod's annotation with key 'scheduler.alpha.kubernetes.io/name'
``

-###### Auto generated by spf13/cobra on 21-Apr-2016
+###### Auto generated by spf13/cobra on 5-May-2016

 <!-- BEGIN MUNGE: GENERATED_ANALYTICS -->

But the merge-bot will complain that this PR is touching the old-docs. -- I think I should update the old-doc this time? Otherwise hack/verify-generated-docs.sh will keep failing.

@k8s-github-robot k8s-github-robot added kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API kind/old-docs labels May 5, 2016
@k8s-bot
Copy link

k8s-bot commented May 5, 2016

GCE e2e build/test failed for commit 533e71f83f10c6f200f471ab260cf82936de8609.

Please reference the list of currently known flakes when examining this failure. If you request a re-test, you must reference the issue describing the flake.

@davidopp
Copy link
Member

davidopp commented May 5, 2016

Yes, I think that's fine. I have to manually merge this PR anyway.

@kevin-wangzefeng
Copy link
Member Author

@k8s-bot test this please, issue: #24620

@k8s-bot
Copy link

k8s-bot commented May 6, 2016

GCE e2e build/test passed for commit 533e71f83f10c6f200f471ab260cf82936de8609.

@k8s-github-robot
Copy link

@kevin-wangzefeng PR needs rebase

@k8s-github-robot k8s-github-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 6, 2016
@k8s-github-robot k8s-github-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 6, 2016
@k8s-bot
Copy link

k8s-bot commented May 6, 2016

GCE e2e build/test passed for commit 82ba4f0.

@kevin-wangzefeng
Copy link
Member Author

@davidopp rebased, PTAL

@davidopp
Copy link
Member

davidopp commented May 6, 2016

Tests are passing, manually merging.

@timothysc
Copy link
Member

YAY!

FYI - @smarterclayton @jeremyeder re: inter-pod anti-affinity for builds...

@davidopp
Copy link
Member

Before you cheer too much, see #25313 - we had to move these into non-default predicates for now, because the performance is bad on large clusters. We will be fixing the performance in 1.4.

@timothysc
Copy link
Member

DOH!

@timothysc timothysc removed the do-not-merge DEPRECATED. Indicates that a PR should not merge. Label can only be manually applied/removed. label May 13, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants