-
Notifications
You must be signed in to change notification settings - Fork 40k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inter pod topological affinity and anti-affinity implementation #22985
Inter pod topological affinity and anti-affinity implementation #22985
Conversation
We found a Contributor License Agreement for you (the sender of this pull request), but were unable to find agreements for the commit author(s). If you authored these, maybe you used a different email address in the git commits than was used to sign the CLA (login here to double check)? If these were authored by someone else, then they will need to sign a CLA as well, and confirm that they're okay with these being contributed to Google. |
Adding label:do-not-merge because PR changes docs prohibited to auto merge |
@googlebot he signed it |
We found a Contributor License Agreement for you (the sender of this pull request) and all commit authors, but as best as we can tell these commits were authored by someone else. If that's the case, please add them to this pull request and have them confirm that they're okay with these commits being contributed to Google. If we're mistaken and you did author these commits, just reply here to confirm. |
@googlebot confirm |
@googlebot I am okay with these commits. |
metadata: | ||
name: with-newlabels | ||
annotations: | ||
scheduler.alpha.kubernetes.io/affinity: > |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do we validate that annotations are sane? It seems we would want an analysis tool that could check if your annotation is valid or could ever be met.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is syntactic validation; see elsewhere in this PR (e.g. validation.go)
Knowing whether an annotation could ever be met is probably not equivalent to the halting problem but may be close. Feel free to file an issue but I think it's outside the scope of a first version of this feature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The validation verifies that the syntax is ok, not that it can be met. I'll open an issue around an analysis tool.
@googlebot confirm |
@googlebot he signed it |
// system will try to eventually evict the pod from its node. | ||
// When there are multiple elements, the lists of nodes corresponding to each | ||
// podAffinityTerm are intersected, i.e. all terms must be satisfied. | ||
// RequiredDuringSchedulingRequiredDuringExecution []PodAffinityTerm `json:"requiredDuringSchedulingRequiredDuringExecution,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know without eviction policy and rescheduling we can't do this part yet, but I fail to see the "practical" use case of IgnoredDuringExecution vs. RequiredDuringExecution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@timothysc Do you mean you think one of the two variants (requiredDuringSchedulingRequiredDuringExecution or requiredDuringSchedulingIgnoredDuringExecution) is not useful? If so, which one do you think is not useful?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see a practical reason for requiredDuringSchedulingIgnoredDuringExecution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At this point the main practical reason is so we can roll out this feature before we have time to implement the "RequiredDuringExecution" part (in particular the logic for eviction). I actually tend to agree with you that once we add eviction (i.e. offer requiredDuringSchedulingRequiredDuringExecution), few people will use requiredDuringSchedulingIgnoredDuringExecution. That said, the current semantics of podSpec.nodeSelector that has been in the system from day one are "ignoredDuringExecution" so I guess that general semantic is not completely useless. Also, some people have argued we actually want two flavors of "requiredDuringExecution" (one that is synchronous eviction and one that is asynchronous eviction) so we'd potentially end up with multiple fields (or some alternative way to distinguish which option you want) even if we didn't have ignoredDuringExecution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should comment that then.
@googlebot I am confirming that, okay with these commits being contributed to Google. |
// system will try to eventually evict the pod from its node. | ||
// When there are multiple elements, the lists of nodes corresponding to each | ||
// podAffinityTerm are intersected, i.e. all terms must be satisfied. | ||
// RequiredDuringSchedulingRequiredDuringExecution []PodAffinityTerm `json:"requiredDuringSchedulingRequiredDuringExecution,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment.
Test failure is because verify-generate-docs.sh is failing. @kevin-wangzefeng can you rebase and run
|
Actually you need to run hack/update-all.sh after rebasing, since rebasing may change other things. |
diff --git a/docs/admin/kube-scheduler.md b/docs/admin/kube-scheduler.md
index c11f801..892a8a1 100644
--- a/docs/admin/kube-scheduler.md
+++ b/docs/admin/kube-scheduler.md
@@ -56,7 +56,9 @@ kube-scheduler
``
--address="0.0.0.0": The IP address to serve on (set to 0.0.0.0 for all interfaces)
--algorithm-provider="DefaultProvider": The scheduling algorithm provider to use, one of: DefaultProvider
+ --failure-domains="kubernetes.io/hostname,failure-domain.beta.kubernetes.io/zone,failure-domain.beta.kubernetes.io/region": Indicate the "all topologies" set for an empty topologyKey when it's used for PreferredDuringScheduling pod anti-affinity.
--google-json-key="": The Google Cloud Platform Service Account JSON Key to use for authentication.
+ --hard-pod-affinity-symmetric-weight=1: RequiredDuringScheduling affinity is not symmetric, but there is an implicit PreferredDuringScheduling affinity rule corresponding to every RequiredDuringScheduling affinity rule. --hard-pod-affinity-symmetric-weight represents the weight of implicit PreferredDuringScheduling affinity rule.
--kube-api-burst=100: Burst to use while talking with kubernetes apiserver
--kube-api-content-type="": ContentType of requests sent to apiserver. Passing application/vnd.kubernetes.protobuf is an experimental feature now.
--kube-api-qps=50: QPS to use while talking with kubernetes apiserver
@@ -73,7 +75,7 @@ kube-scheduler
--scheduler-name="default-scheduler": Name of the scheduler, used to select which pods will be processed by this scheduler, based on pod's annotation with key 'scheduler.alpha.kubernetes.io/name'
``
-###### Auto generated by spf13/cobra on 21-Apr-2016
+###### Auto generated by spf13/cobra on 5-May-2016
<!-- BEGIN MUNGE: GENERATED_ANALYTICS --> But the merge-bot will complain that this PR is touching the old-docs. -- I think I should update the old-doc this time? Otherwise |
dc21a5e
to
7f2ad8e
Compare
7f2ad8e
to
533e71f
Compare
GCE e2e build/test failed for commit 533e71f83f10c6f200f471ab260cf82936de8609. Please reference the list of currently known flakes when examining this failure. If you request a re-test, you must reference the issue describing the flake. |
Yes, I think that's fine. I have to manually merge this PR anyway. |
GCE e2e build/test passed for commit 533e71f83f10c6f200f471ab260cf82936de8609. |
@kevin-wangzefeng PR needs rebase |
533e71f
to
82ba4f0
Compare
GCE e2e build/test passed for commit 82ba4f0. |
@davidopp rebased, PTAL |
Tests are passing, manually merging. |
YAY! FYI - @smarterclayton @jeremyeder re: inter-pod anti-affinity for builds... |
Before you cheer too much, see #25313 - we had to move these into non-default predicates for now, because the performance is bad on large clusters. We will be fixing the performance in 1.4. |
DOH! |
This PR implements the Phase 1 of inter-pod topological affinity and anti-affinity:
P.S. For step 4, this PR currently only implements the admission controller, and "enabling it by default" (changing the deployment scripts) will be in a follow up PR.
As it is the Phase 1, the API change does not contain adding the new field into podSpec, but is in the annotation way which is the same with what we did for the node affinity. A follow up PR to add introductions is also in the plan.
@davidopp @kubernetes/sig-scheduling PTAL.