-
Notifications
You must be signed in to change notification settings - Fork 40.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Disable session affinity for internal kuberntes service #56690
Conversation
Under following conditions session affinity leads to a deadlock: - Self hosted controller-manager, where it talks to API servers via kubernetes service ClusterIP - default master-count reconcilier is used - --apiserver-count is set to >1 according to the help message - number of responsive APIServers goes below `apiserver-count` - all controller-managers happen to be hashed to APIServers which are down. What then happens is that controller managers never be able to contact APIServer, despite correctly working APIServer available. Less serious outages also possible for other consumers of kubernetes service, such as operators, kube-dns, flannel & calico, etc. There is always non zero chance, that given consumer is hashed to an apiserver which is down. Revert "give the kubernetes service client ip session affinity" This reverts commit e21ebbc.
/ok-to-test /assign @mikedanese @enisoc fyi, sounds serious for 1.9 release |
I think we don't want this for the reasons mentioned in the original PR. The issue you mention (#22609) should be fixed by #51698 which we are currently transitioning to. cc @rphillips @ncdc |
Lease endpoint reconciler is alpha feature and is not going to be a default reconciler anytime soon, IMHO it cannot be "solution" to this deadlock untils it is default Original PR mentions hypothetical issues with ordering at the cost of a very real deadlock which at least 3 users experienced, spent time debugging and narrowing down to individual commit and then raising their concerns in that PR comments. Original PR was merged within a day without any discussion. Essence of original PR is that answer to following question:
was given essentially as "yes, it is better to have all requests of 1/3 users to fail", which is clearly wrong answer, as if you fail all requests for 1/3 of users they have NO CHANCE of recovery, given that there are enough singleton users of kubernetes API this should be obvious. TLDR; Original PR does more damage than good and should be reverted. |
I took some look into things mentioned in this PR. Some of my comments.
+1
I don't fully understand this original argument. The requests that are served from apiserver cache are:
And I really hope that all opt-ins for that were done carefully.
This was significantly improved since the this was mentioned. So maybe that's no longer that important now?
+1 Personally, I have a slight preference for merging it (until we will have the new reconciler), but I don't have that strong opinion. I would like to hear opinions from others - including @lavalamp who LGTM-ed original PR and @bowei because I may be missing something. |
The PR in question was merge a year and a half ago so I don't think it can be argued that this is urgent. I'm ok with whatever @wojtek-t and @lavalamp say since they are the experts. However this is a significant change that affects all in cluster clients and I am concerned about invalidating testing this late in a release cycle. |
+1 - that's why I would like to hear other opinions. |
Original PR caused an outage for us. We actioned by manually overriding session affinity on the apiserver service. The original PR breaks HA apiservers in my opinion, and should be reverted. |
Now that kubecon is over, any chance it can be looked at?:) |
@lavalamp - friendly ping |
@lavalamp , can you have a look and add your opinion on this, please? |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
/remove-lifecycle stale |
The justification I gave for LGTMing the original PR is no longer valid, as consistent reads are on by default now. I think I may have also thought that session affinity was implemented by caching and not by static hashing--I am not sure how it is implemented at the moment. We should stop using the --master-count mechanism ASAP, though, now that we have something better. My apologies that you had to wait for an additional kubecon before I noticed this :) /lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: lavalamp, redbaron The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/retest |
/test pull-kubernetes-e2e-gce |
/test all [submit-queue is verifying that this PR is safe to merge] |
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions here. |
…y_for_1_10 Automatic merge from submit-queue. [backport] Disable session affinity for internal kubernetes service Under following conditions session affinity leads to a deadlock: - Self hosted controller-manager, where it talks to API servers via kubernetes service ClusterIP - default master-count reconcilier is used - --apiserver-count is set to >1 according to the help message - number of responsive APIServers goes below `apiserver-count` - all controller-managers happen to be hashed to APIServers which are down. What then happens is that controller managers never be able to contact APIServer, despite correctly working APIServer available. Less serious outages also possible for other consumers of kubernetes service, such as operators, kube-dns, flannel & calico, etc. There is always non zero chance, that given consumer is hashed to an apiserver which is down. Revert "give the kubernetes service client ip session affinity" This reverts commit e21ebbc. **What this PR does / why we need it**: Backporting #56690 to 1.10 release branch. **Which issue(s) this PR fixes** Fixes #23129 **Release note**: ```release-note Disable session affinity for internal kubernetes service - Backport of #56690 to 1.10 release branch ```
Under following conditions session affinity leads to a deadlock:
via kubernetes service ClusterIP
apiserver-count
are down.
What then happens is that controller managers never be able to
contact APIServer, despite correctly working APIServer available.
Less serious outages also possible for other consumers of kubernetes
service, such as operators, kube-dns, flannel & calico, etc. There is
always non zero chance, that given consumer is hashed to an apiserver
which is down.
This reverts PR #23129
/sig api-machinery
CCing: