Inter pod topological affinity and anti-affinity implementation #22985

kevin-wangzefeng · 2016-03-15T13:06:10Z

This PR implements the Phase 1 of inter-pod topological affinity and anti-affinity:

Add the Affinity field to PodSpec and the PodAffinity and PodAntiAffinity types to the API along with all of their descendant types.

Implement a scheduler predicate that takes RequiredDuringSchedulingIgnoredDuringExecution
affinity and anti-affinity into account. Include a workaround for the issue described at the end of the Affinity section of the Examples section (can't schedule first pod).

Implement a scheduler priority function that takes PreferredDuringSchedulingIgnoredDuringExecution affinity and anti-affinity into account

Implement admission controller that rejects requests that specify "all namespaces" with non-"node" TopologyKey for RequiredDuringScheduling anti-affinity.
This admission controller should be enabled by default.

P.S. For step 4, this PR currently only implements the admission controller, and "enabling it by default" (changing the deployment scripts) will be in a follow up PR.

As it is the Phase 1, the API change does not contain adding the new field into podSpec, but is in the annotation way which is the same with what we did for the node affinity. A follow up PR to add introductions is also in the plan.

@davidopp @kubernetes/sig-scheduling PTAL.

googlebot · 2016-03-15T13:06:14Z

We found a Contributor License Agreement for you (the sender of this pull request), but were unable to find agreements for the commit author(s). If you authored these, maybe you used a different email address in the git commits than was used to sign the CLA (login here to double check)? If these were authored by someone else, then they will need to sign a CLA as well, and confirm that they're okay with these being contributed to Google.

k8s-github-robot · 2016-03-15T13:10:15Z

Adding label:do-not-merge because PR changes docs prohibited to auto merge
See http://kubernetes.github.io/docs/editdocs/ for information about editting docs

kevin-wangzefeng · 2016-03-15T14:22:30Z

@googlebot he signed it

googlebot · 2016-03-15T14:22:34Z

We found a Contributor License Agreement for you (the sender of this pull request) and all commit authors, but as best as we can tell these commits were authored by someone else. If that's the case, please add them to this pull request and have them confirm that they're okay with these commits being contributed to Google. If we're mistaken and you did author these commits, just reply here to confirm.

kevin-wangzefeng · 2016-03-15T14:34:24Z

@googlebot confirm

SrinivasChilveri · 2016-03-15T14:49:38Z

@googlebot I am okay with these commits.

timothysc · 2016-03-15T15:54:33Z

docs/user-guide/node-selection/pod-with-pod-affinity.yaml

+metadata:
+  name: with-newlabels
+  annotations:
+    scheduler.alpha.kubernetes.io/affinity: >


How do we validate that annotations are sane? It seems we would want an analysis tool that could check if your annotation is valid or could ever be met.

There is syntactic validation; see elsewhere in this PR (e.g. validation.go)
Knowing whether an annotation could ever be met is probably not equivalent to the halting problem but may be close. Feel free to file an issue but I think it's outside the scope of a first version of this feature.

The validation verifies that the syntax is ok, not that it can be met. I'll open an issue around an analysis tool.

SrinivasChilveri · 2016-03-16T04:32:40Z

@googlebot confirm

kevin-wangzefeng · 2016-03-16T10:45:27Z

@googlebot he signed it

timothysc · 2016-03-16T15:25:41Z

pkg/api/types.go

+	// system will try to eventually evict the pod from its node.
+	// When there are multiple elements, the lists of nodes corresponding to each
+	// podAffinityTerm are intersected, i.e. all terms must be satisfied.
+	// RequiredDuringSchedulingRequiredDuringExecution []PodAffinityTerm  `json:"requiredDuringSchedulingRequiredDuringExecution,omitempty"`


I know without eviction policy and rescheduling we can't do this part yet, but I fail to see the "practical" use case of IgnoredDuringExecution vs. RequiredDuringExecution.

@timothysc Do you mean you think one of the two variants (requiredDuringSchedulingRequiredDuringExecution or requiredDuringSchedulingIgnoredDuringExecution) is not useful? If so, which one do you think is not useful?

I don't see a practical reason for requiredDuringSchedulingIgnoredDuringExecution.

At this point the main practical reason is so we can roll out this feature before we have time to implement the "RequiredDuringExecution" part (in particular the logic for eviction). I actually tend to agree with you that once we add eviction (i.e. offer requiredDuringSchedulingRequiredDuringExecution), few people will use requiredDuringSchedulingIgnoredDuringExecution. That said, the current semantics of podSpec.nodeSelector that has been in the system from day one are "ignoredDuringExecution" so I guess that general semantic is not completely useless. Also, some people have argued we actually want two flavors of "requiredDuringExecution" (one that is synchronous eviction and one that is asynchronous eviction) so we'd potentially end up with multiple fields (or some alternative way to distinguish which option you want) even if we didn't have ignoredDuringExecution.

We should comment that then.

SrinivasChilveri · 2016-03-17T05:09:59Z

@googlebot I am confirming that, okay with these commits being contributed to Google.

timothysc · 2016-03-17T20:37:12Z

pkg/api/types.go

+	// system will try to eventually evict the pod from its node.
+	// When there are multiple elements, the lists of nodes corresponding to each
+	// podAffinityTerm are intersected, i.e. all terms must be satisfied.
+	// RequiredDuringSchedulingRequiredDuringExecution []PodAffinityTerm  `json:"requiredDuringSchedulingRequiredDuringExecution,omitempty"`


Same comment.

davidopp · 2016-05-04T20:52:29Z

Test failure is because verify-generate-docs.sh is failing. @kevin-wangzefeng can you rebase and run hack/update-generated-docs.sh (or hack/update-all.sh) ? Here's the error:

Verifying ./hack/../hack/verify-generated-docs.sh
Go version: go version go1.6.2 linux/amd64
+++ [0504 02:06:36] Building go targets for linux/amd64:
    cmd/gendocs
    cmd/genkubedocs
    cmd/genman
    cmd/genyaml
    cmd/genbashcomp
    cmd/mungedocs
+++ [0504 02:07:00] Placing binaries
--- /tmp/tmp.OJsjbZIy8P/docs/admin/kube-scheduler.md    2016-05-04 02:07:02.483487176 -0700
+++ /go/src/k8s.io/kubernetes/docs/admin/kube-scheduler.md  2016-05-04 01:57:01.747580618 -0700
@@ -1,3 +1,37 @@
+<!-- BEGIN MUNGE: UNVERSIONED_WARNING -->
+
+<!-- BEGIN STRIP_FOR_RELEASE -->
+
+<img  src="http://kubernetes.io/img/warning.png" alt="WARNING"
+     width="25" height="25">
+<img  src="http://kubernetes.io/img/warning.png" alt="WARNING"
+     width="25" height="25">
+<img  src="http://kubernetes.io/img/warning.png" alt="WARNING"
+     width="25" height="25">
+<img  src="http://kubernetes.io/img/warning.png" alt="WARNING"
+     width="25" height="25">
+<img  src="http://kubernetes.io/img/warning.png" alt="WARNING"
+     width="25" height="25">
+
+<h2>PLEASE NOTE: This document applies to the HEAD of the source tree</h2>
+
+If you are using a released version of Kubernetes, you should
+refer to the docs that go with that version.
+
+<!-- TAG RELEASE_LINK, added by the munger automatically -->
+<strong>
+The latest release of this document can be found
+[here](http://releases.k8s.io/release-1.2/docs/admin/kube-scheduler.md).
+
+Documentation for other releases can be found at
+[releases.k8s.io](http://releases.k8s.io).
+</strong>
+--
+
+<!-- END STRIP_FOR_RELEASE -->
+
+<!-- END MUNGE: UNVERSIONED_WARNING -->
+
 ## kube-scheduler


@@ -22,9 +56,7 @@ kube-scheduler

   --address="0.0.0.0": The IP address to serve on (set to 0.0.0.0 for all interfaces)
   --algorithm-provider="DefaultProvider": The scheduling algorithm provider to use, one of: DefaultProvider

 --failure-domains="kubernetes.io/hostname,failure-domain.beta.kubernetes.io/zone,failure-domain.beta.kubernetes.io/region": Indicate the "all topologies" set for an empty topologyKey when it's used for PreferredDuringScheduling pod anti-affinity.

--google-json-key="": The Google Cloud Platform Service Account JSON Key to use for authentication.

 --hard-pod-affinity-symmetric-weight=1: RequiredDuringScheduling affinity is not symmetric, but there is an implicit PreferredDuringScheduling affinity rule corresponding to every RequiredDuringScheduling affinity rule. --hard-pod-affinity-symmetric-weight represents the weight of implicit PreferredDuringScheduling affinity rule.

--kube-api-burst=100: Burst to use while talking with kubernetes apiserver
--kube-api-content-type="": ContentType of requests sent to apiserver. Passing application/vnd.kubernetes.protobuf is an experimental feature now.
--kube-api-qps=50: QPS to use while talking with kubernetes apiserver
@@ -41,4 +73,9 @@ kube-scheduler
--scheduler-name="default-scheduler": Name of the scheduler, used to select which pods will be processed by this scheduler, based on pod's annotation with key 'scheduler.alpha.kubernetes.io/name'


-###### Auto generated by spf13/cobra on 4-May-2016
+###### Auto generated by spf13/cobra on 21-Apr-2016
+
+
+<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
+[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/admin/kube-scheduler.md?pixel)]()
+<!-- END MUNGE: GENERATED_ANALYTICS -->
Generated docs are out of date. Please run hack/update-generated-docs.sh
!!! Error in ./hack/../hack/verify-generated-docs.sh:28
'"${KUBE_ROOT}/hack/after-build/verify-generated-docs.sh" "$@"' exited with status 1
Call stack:
1: ./hack/../hack/verify-generated-docs.sh:28 main(...)
Exiting with status 1
FAILED   ./hack/../hack/verify-generated-docs.sh    30s

davidopp · 2016-05-05T03:12:19Z

Actually you need to run hack/update-all.sh after rebasing, since rebasing may change other things.

kevin-wangzefeng · 2016-05-05T16:57:28Z

hack/verify-generated-docs.sh failed because there are two new flags --failure-domains and --hard-pod-affinity-symmetric-weight. To fix this failure need to update docs/admin/kube-scheduler.md, like this:

diff --git a/docs/admin/kube-scheduler.md b/docs/admin/kube-scheduler.md
index c11f801..892a8a1 100644
--- a/docs/admin/kube-scheduler.md
+++ b/docs/admin/kube-scheduler.md
@@ -56,7 +56,9 @@ kube-scheduler
``
       --address="0.0.0.0": The IP address to serve on (set to 0.0.0.0 for all interfaces)
       --algorithm-provider="DefaultProvider": The scheduling algorithm provider to use, one of: DefaultProvider
+      --failure-domains="kubernetes.io/hostname,failure-domain.beta.kubernetes.io/zone,failure-domain.beta.kubernetes.io/region": Indicate the "all topologies" set for an empty topologyKey when it's used for PreferredDuringScheduling pod anti-affinity.
       --google-json-key="": The Google Cloud Platform Service Account JSON Key to use for authentication.
+      --hard-pod-affinity-symmetric-weight=1: RequiredDuringScheduling affinity is not symmetric, but there is an implicit PreferredDuringScheduling affinity rule corresponding to every RequiredDuringScheduling affinity rule. --hard-pod-affinity-symmetric-weight represents the weight of implicit PreferredDuringScheduling affinity rule.
       --kube-api-burst=100: Burst to use while talking with kubernetes apiserver
       --kube-api-content-type="": ContentType of requests sent to apiserver. Passing application/vnd.kubernetes.protobuf is an experimental feature now.
       --kube-api-qps=50: QPS to use while talking with kubernetes apiserver
@@ -73,7 +75,7 @@ kube-scheduler
       --scheduler-name="default-scheduler": Name of the scheduler, used to select which pods will be processed by this scheduler, based on pod's annotation with key 'scheduler.alpha.kubernetes.io/name'
``

-###### Auto generated by spf13/cobra on 21-Apr-2016
+###### Auto generated by spf13/cobra on 5-May-2016

 <!-- BEGIN MUNGE: GENERATED_ANALYTICS -->

But the merge-bot will complain that this PR is touching the old-docs. -- I think I should update the old-doc this time? Otherwise hack/verify-generated-docs.sh will keep failing.

k8s-bot · 2016-05-05T18:27:32Z

GCE e2e build/test failed for commit 533e71f83f10c6f200f471ab260cf82936de8609.

Please reference the list of currently known flakes when examining this failure. If you request a re-test, you must reference the issue describing the flake.

davidopp · 2016-05-05T18:45:54Z

Yes, I think that's fine. I have to manually merge this PR anyway.

kevin-wangzefeng · 2016-05-06T01:22:51Z

@k8s-bot test this please, issue: #24620

k8s-bot · 2016-05-06T02:04:03Z

GCE e2e build/test passed for commit 533e71f83f10c6f200f471ab260cf82936de8609.

k8s-github-robot · 2016-05-06T02:53:58Z

@kevin-wangzefeng PR needs rebase

k8s-bot · 2016-05-06T08:11:04Z

GCE e2e build/test passed for commit 82ba4f0.

kevin-wangzefeng · 2016-05-06T08:32:35Z

@davidopp rebased, PTAL

davidopp · 2016-05-06T08:39:45Z

Tests are passing, manually merging.

timothysc · 2016-05-13T18:49:28Z

YAY!

FYI - @smarterclayton @jeremyeder re: inter-pod anti-affinity for builds...

davidopp · 2016-05-13T19:00:57Z

Before you cheer too much, see #25313 - we had to move these into non-default predicates for now, because the performance is bad on large clusters. We will be fixing the performance in 1.4.

timothysc · 2016-05-13T19:41:20Z

DOH!

googlebot added the cla: no label Mar 15, 2016

k8s-github-robot assigned erictune Mar 15, 2016

kevin-wangzefeng closed this Mar 15, 2016

kevin-wangzefeng reopened this Mar 15, 2016

timothysc reviewed Mar 15, 2016
View reviewed changes

davidopp assigned davidopp and unassigned erictune Mar 15, 2016

k8s-github-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 15, 2016

timothysc reviewed Mar 16, 2016
View reviewed changes

timothysc reviewed Mar 17, 2016
View reviewed changes

kevin-wangzefeng force-pushed the inter-pod-affinity branch from dc21a5e to 7f2ad8e Compare May 5, 2016 17:19

k8s-github-robot added kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API kind/old-docs labels May 5, 2016

kevin-wangzefeng force-pushed the inter-pod-affinity branch from 7f2ad8e to 533e71f Compare May 5, 2016 17:39

k8s-github-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 6, 2016

implement inter pod topological affinity and anti-affinity

82ba4f0

kevin-wangzefeng force-pushed the inter-pod-affinity branch from 533e71f to 82ba4f0 Compare May 6, 2016 07:34

k8s-github-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 6, 2016

davidopp merged commit 2983c1d into kubernetes:master May 6, 2016

wojtek-t mentioned this pull request May 6, 2016

Remove nodeName from predicate signature. #24968

Merged

timothysc removed the do-not-merge DEPRECATED. Indicates that a PR should not merge. Label can only be manually applied/removed. label May 13, 2016

mfojtik mentioned this pull request Jun 30, 2016

[k8s.io] SchedulerPredicates [Serial] validates that a pod with an invalid NodeAffinity is rejected openshift/origin#9397

Closed

ConnorDoyle mentioned this pull request Oct 6, 2016

Expand test coverage for affinity #34253

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inter pod topological affinity and anti-affinity implementation #22985

Inter pod topological affinity and anti-affinity implementation #22985

kevin-wangzefeng commented Mar 15, 2016

googlebot commented Mar 15, 2016

k8s-github-robot commented Mar 15, 2016

kevin-wangzefeng commented Mar 15, 2016

googlebot commented Mar 15, 2016

kevin-wangzefeng commented Mar 15, 2016

SrinivasChilveri commented Mar 15, 2016

timothysc Mar 15, 2016

davidopp Mar 17, 2016

timothysc Mar 18, 2016

SrinivasChilveri commented Mar 16, 2016

kevin-wangzefeng commented Mar 16, 2016

timothysc Mar 16, 2016

davidopp Mar 17, 2016

timothysc Mar 18, 2016

davidopp Mar 18, 2016

timothysc Apr 27, 2016

SrinivasChilveri commented Mar 17, 2016

timothysc Mar 17, 2016

davidopp commented May 4, 2016

davidopp commented May 5, 2016

kevin-wangzefeng commented May 5, 2016

k8s-bot commented May 5, 2016

davidopp commented May 5, 2016

kevin-wangzefeng commented May 6, 2016

k8s-bot commented May 6, 2016

k8s-github-robot commented May 6, 2016

k8s-bot commented May 6, 2016

kevin-wangzefeng commented May 6, 2016

davidopp commented May 6, 2016

timothysc commented May 13, 2016

davidopp commented May 13, 2016

timothysc commented May 13, 2016

Inter pod topological affinity and anti-affinity implementation #22985

Inter pod topological affinity and anti-affinity implementation #22985

Conversation

kevin-wangzefeng commented Mar 15, 2016

googlebot commented Mar 15, 2016

k8s-github-robot commented Mar 15, 2016

kevin-wangzefeng commented Mar 15, 2016

googlebot commented Mar 15, 2016

kevin-wangzefeng commented Mar 15, 2016

SrinivasChilveri commented Mar 15, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SrinivasChilveri commented Mar 16, 2016

kevin-wangzefeng commented Mar 16, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SrinivasChilveri commented Mar 17, 2016

Choose a reason for hiding this comment

davidopp commented May 4, 2016

davidopp commented May 5, 2016

kevin-wangzefeng commented May 5, 2016

k8s-bot commented May 5, 2016

davidopp commented May 5, 2016

kevin-wangzefeng commented May 6, 2016

k8s-bot commented May 6, 2016

k8s-github-robot commented May 6, 2016

k8s-bot commented May 6, 2016

kevin-wangzefeng commented May 6, 2016

davidopp commented May 6, 2016

timothysc commented May 13, 2016

davidopp commented May 13, 2016

timothysc commented May 13, 2016