Inter-pod topological affinity/anti-affinity proposal #18265

davidopp · 2015-12-06T23:21:26Z

Fixes #14816.

It's helpful to read #18261 and #18263 before this one.

ref/ #367

@bgrant0607 @timothysc @resouer @ravigadde @kubernetes/rh-cluster-infra @kevin-wangzefeng @alfred-huangjian

k8s-github-robot · 2015-12-06T23:31:30Z

Labelling this PR as size/L

k8s-bot · 2015-12-06T23:53:24Z

GCE e2e build/test failed for commit 43ef32055bd8420e0517f74f47dd05c9f8ae4311.

k8s-bot · 2015-12-07T00:10:21Z

GCE e2e build/test failed for commit d82a185df71c718d47a26a2a8f4f5908a444c3fe.

resouer · 2015-12-07T08:57:04Z

docs/proposals/affinity.md

+	HardAffinityExpressions: {{PodSelector: P1, TopologyKey: "node"}},
+	HardAntiAffinityExpressions: {{PodSelector: P2, TopologyKey: "rack"}},
+	SoftAffinityExpressions: {{PodSelector: P3, TopologyKey: "zone"}},
+	SoftAntiAffinityExpressions: {{PodSelector: P4, TopologyKey: "power"}}


Do you think multiple TopologyKey also make sense here?

I don't think it's necessary; you should just be able to write two expressions separate with the same PodSelector to get the same result. See for example the example "Spread the pods of this service S across nodes and zones" later in the doc.

@bgrant0607 suggested that as a special case, we can make empty TopologyKey mean "all topologies". But if you want multiple specific topologies, you just use multiple rules.

k8s-bot · 2015-12-07T09:03:20Z

GCE e2e test build/test passed for commit 73cdb41ed54d08f75316dc12c155c988d91296cc.

davidopp · 2015-12-07T20:19:31Z

@quinton-hoole this is relevant to both flavors of Ubernetes

timothysc · 2015-12-08T03:11:41Z

docs/proposals/affinity.md

+When `HardAffinityExpressions` has multiple elements, the requirements are ANDed. Likewise for `HardAntiAffinityExpressions`.
+For `SoftAffinityExpressions` and `SoftAntiAffinityExpressions`, nodes that satisfy more of the rules are preferred over those that satisfy fewer.
+
+## A comment on symmetry


In order to craft expressions that are sane we may want to call out the creation of an analysis tool that can run as if it were the scheduler.

e.g. you have 0 possible matches.
xref: https://research.cs.wisc.edu/htcondor/HTCondorWeek2013/presentations/KnoellerJ_QAnalyze.pdf

davidopp · 2015-12-08T05:33:28Z

Comments from meeting with @bgrant0607 today follow. I will update the proposal to reflect them shortly:

It's useful to allow user to specify "all topologies" without having to know all of their names; so interpret empty TopologyKey as meaning "this rule applies to all topology domains"
The proposal suggests to specially interpret PodSelector in AffinityTerm so as to interpret empty PodSelector to mean "all pods globally" rather than the normal interpretation which would be "all pods in the namespace of the pod." Brian suggested instead to add a field to AffinityTerm that specifies which Namespace(s) the PodSelector should apply to. The field would be a list of Namespaces that the PodSelector should match against; a nil list would mean use the default behavior (default to same as pod's Namespace) and an empty list would mean interpret as applying to all Namespaces.
Should tell users not to use these features until they have been in Kubelet and master for enough binary versions that we feel comfortable we will not need to roll back either to a version that does not support them; longer-term would be good to have a way to enforce this (Cluster versioning #4855)

BTW one thing that occurred to me is that in theory, Kubelet could enforce node-level hard affinity and node-level hard anti-affinity rules, but I think this would be confusing, since it can't enforce rules at any other level of the topology. Better to just leave this feature to the scheduler.

ghost · 2015-12-10T21:22:42Z

docs/proposals/affinity.md

+in this doc later, in the section "Examples."
+* Affinity
+  * Co-locate the pods from a particular Service or Job in the same availability zone
+  * When scheduling a pod that uses a license for a particular piece of software, try


I assume that here you're trying to express a generalization of something like the classic IAAS use case where e.g. if a Windows VM lands on a physical machine, that physical machine incurs a monthly windows license fee, even if the VM runs for less than a month (e.g. an hour)? (so subsequent Windows VM's landing on that physical machine within the license period/month can inherit their predecessor's license? If that's the use case, I think that node labelling is more appropriate (specifically because the predecessor pod/VM may no longer be on the machine/node, so pod affinity won't actually work). If you're meaning some other use case, I'm struggling to imagine a good example.

IMHO you would need dynamic labeling (a.k.a - attributes, or machine_ad) for persistence in the windows license. But you may also want it for hot-tracking of images or data, etc.

Yeah my example was wrong. As @quinton-hoole points out, you want the node property to persist even after the pod goes away, so the affinity mechanism I've described in this doc doesn't solve this case. What you need for this use case is a special version of "node preferences" (#18261) -- pod prefers nodes with a "windows" label; node gets a windows label when it starts a pod that has a windows preference; node loses the label 30 days later if there are no pods with windows preference running on the node at that time. I know how you could implement this but I don't think we should do it now.

davidopp · 2016-01-22T20:06:47Z

Our current plan is to implement this as an alpha annotation. For that level of maturity, LGTM.

I think that if we do node affinity as an annotation in 1.2, and things go well with that, it would be OK to do pod affinity as a field in 1.3. I agree if we do pod affinity in 1.2 we should do it as an annotation.

I think all of these proposals are failing the generated doc check. Please rebase them all and run hack/update-generated-docs.sh.

Yeah I wasn't planning to do the cleanup stuff like squash commits and running update-generated-docs.sh until the content was LGTMd.

k8s-github-robot · 2016-01-24T02:05:10Z

PR changed after LGTM, removing LGTM.

davidopp · 2016-01-24T02:06:37Z

@bgrant0607 : Fixed to reflect new API syntax, and added substantial new sections discussing the corner-cases/dangers that were brought up in the review comments (and updated implementation plan to reflect mitigation strategies for them).

PTAL.

k8s-bot · 2016-01-24T02:35:42Z

GCE e2e test build/test passed for commit e89a094dc7a757ecb59579441a88bc10ec449a9f.

k8s-bot · 2016-01-24T02:53:38Z

GCE e2e test build/test passed for commit 265e101034b19247a57c1578c95c540c38239454.

k8s-bot · 2016-01-24T03:03:44Z

GCE e2e test build/test passed for commit 10fd20e2036448707f00437a7d6485aa0df91609.

k8s-bot · 2016-01-24T03:07:28Z

GCE e2e test build/test passed for commit f5a1fc59a8f7ff48c63eeb86a0316199f860ac0b.

k8s-bot · 2016-01-24T03:13:21Z

GCE e2e test build/test passed for commit cea5cf4.

bgrant0607 · 2016-01-24T19:18:33Z

docs/design/podaffinity.md

+	// nil list means "this pod's namespace," empty list means "all namespaces"
+	// The json tag here is not "omitempty" since we need to distinguish nil and empty.
+	// See https://golang.org/pkg/encoding/json/#Marshal for more details.
+	Namespaces []api.Namespace  `json:"namespaces"`


Should have omitempty. Nil is default

bgrant0607 · 2016-01-24T21:45:47Z

LGTM. In the future, please don't squash until the review is done.

k8s-github-robot · 2016-01-25T13:57:09Z

Automatic merge from submit-queue

Auto commit by PR queue bot

Automatic merge from submit-queue (batch tested with PRs 18295, 18265). UPSTREAM: 58408: switch hyperkube to cobra This gives us a hyperkube that works. Since we only ship hyperkube in our images (not the individual commands), we really need this. Origin-commit: 4762d6a6e11c9f3c7d7a7d3ac8bdbbaa7c603548

davidopp assigned bgrant0607 Dec 6, 2015

googlebot added the cla: yes label Dec 6, 2015

k8s-github-robot added kind/design Categorizes issue or PR as related to design. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Dec 6, 2015

davidopp force-pushed the affinity branch from 43ef320 to d82a185 Compare December 6, 2015 23:40

davidopp changed the title ~~Inter-pod topological affinity/anti-affinity proposal initial draft~~ Inter-pod topological affinity/anti-affinity proposal Dec 7, 2015

davidopp force-pushed the affinity branch from d82a185 to 73cdb41 Compare December 7, 2015 08:24

resouer reviewed Dec 7, 2015
View reviewed changes

timothysc reviewed Dec 8, 2015
View reviewed changes

kevin-wangzefeng mentioned this pull request Dec 9, 2015

Node affinity and NodeSelector design doc #18261

Merged

davidopp mentioned this pull request Dec 9, 2015

Limit number of pods of the same type in the same minion #3945

Closed

kevin-wangzefeng mentioned this pull request Dec 10, 2015

Proposal: Affinity Priority for pods of different RC/services #14484

Closed

ghost reviewed Dec 10, 2015
View reviewed changes

davidopp mentioned this pull request Jan 23, 2016

(Tracking Issue) Ubernetes (Federation) Lite #17059

Closed

7 tasks

bgrant0607 mentioned this pull request Jan 23, 2016

Rolling-update by node #18450

Closed

k8s-github-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 24, 2016

Inter-pod topological affinity/anti-affinity design doc.

cea5cf4

davidopp force-pushed the affinity branch from f5a1fc5 to cea5cf4 Compare January 24, 2016 02:40

k8s-github-robot removed the kind/design Categorizes issue or PR as related to design. label Jan 24, 2016

davidopp added the kind/design Categorizes issue or PR as related to design. label Jan 24, 2016

bgrant0607 reviewed Jan 24, 2016
View reviewed changes

bgrant0607 added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 24, 2016

k8s-github-robot pushed a commit that referenced this pull request Jan 25, 2016

Merge pull request #18265 from davidopp/affinity

33d1f6c

Auto commit by PR queue bot

k8s-github-robot merged commit 33d1f6c into kubernetes:master Jan 25, 2016

davidopp mentioned this pull request May 8, 2016

Requirements for Affinity to graduate to Beta and then v1 #25319

Open

9 tasks

aronchick mentioned this pull request Aug 26, 2016

Workload spreading across failure domains (fix pod anti-affinity performance problem) kubernetes/enhancements#51

Closed

21 tasks

xingzhou pushed a commit to xingzhou/kubernetes that referenced this pull request Dec 15, 2016

Merge pull request kubernetes#18265 from davidopp/affinity

248d4ea

Auto commit by PR queue bot

kevin-wangzefeng mentioned this pull request Jul 24, 2017

Admission controller "LimitPodHardAntiAffinityTopology" is not enabled by default #49469

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inter-pod topological affinity/anti-affinity proposal #18265

Inter-pod topological affinity/anti-affinity proposal #18265

davidopp commented Dec 6, 2015

k8s-github-robot commented Dec 6, 2015

k8s-bot commented Dec 6, 2015

k8s-bot commented Dec 7, 2015

resouer Dec 7, 2015

davidopp Dec 7, 2015

davidopp Dec 8, 2015

k8s-bot commented Dec 7, 2015

davidopp commented Dec 7, 2015

timothysc Dec 8, 2015

davidopp commented Dec 8, 2015

ghost Dec 10, 2015

timothysc Dec 11, 2015

davidopp Dec 13, 2015

davidopp commented Jan 22, 2016

k8s-github-robot commented Jan 24, 2016

davidopp commented Jan 24, 2016

k8s-bot commented Jan 24, 2016

k8s-bot commented Jan 24, 2016

k8s-bot commented Jan 24, 2016

k8s-bot commented Jan 24, 2016

k8s-bot commented Jan 24, 2016

bgrant0607 Jan 24, 2016

bgrant0607 commented Jan 24, 2016

k8s-github-robot commented Jan 25, 2016

Inter-pod topological affinity/anti-affinity proposal #18265

Inter-pod topological affinity/anti-affinity proposal #18265

Conversation

davidopp commented Dec 6, 2015

k8s-github-robot commented Dec 6, 2015

k8s-bot commented Dec 6, 2015

k8s-bot commented Dec 7, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

k8s-bot commented Dec 7, 2015

davidopp commented Dec 7, 2015

Choose a reason for hiding this comment

davidopp commented Dec 8, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

davidopp commented Jan 22, 2016

k8s-github-robot commented Jan 24, 2016

davidopp commented Jan 24, 2016

k8s-bot commented Jan 24, 2016

k8s-bot commented Jan 24, 2016

k8s-bot commented Jan 24, 2016

k8s-bot commented Jan 24, 2016

k8s-bot commented Jan 24, 2016

Choose a reason for hiding this comment

bgrant0607 commented Jan 24, 2016

k8s-github-robot commented Jan 25, 2016