Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disable session affinity for internal kuberntes service #56690

Merged
merged 1 commit into from
May 10, 2018

Conversation

redbaron
Copy link
Contributor

@redbaron redbaron commented Dec 1, 2017

Under following conditions session affinity leads to a deadlock:

  • Self hosted controller-manager, where it talks to API servers
    via kubernetes service ClusterIP
  • default master-count reconcilier is used
  • --apiserver-count is set to >1 according to the help message
  • number of responsive APIServers goes below apiserver-count
  • all controller-managers happen to be hashed to APIServers which
    are down.

What then happens is that controller managers never be able to
contact APIServer, despite correctly working APIServer available.

Less serious outages also possible for other consumers of kubernetes
service, such as operators, kube-dns, flannel & calico, etc. There is
always non zero chance, that given consumer is hashed to an apiserver
which is down.

This reverts PR #23129

/sig api-machinery
CCing:

NONE

Under following conditions session affinity leads to a deadlock:
  - Self hosted controller-manager, where it talks to API servers
    via kubernetes service ClusterIP
  - default master-count reconcilier is used
  - --apiserver-count is set to >1 according to the help message
  - number of responsive APIServers goes below `apiserver-count`
  - all controller-managers happen to be hashed to APIServers which
    are down.

What then happens is that controller managers never be able to
contact APIServer, despite correctly working APIServer available.

Less serious outages also possible for other consumers of kubernetes
service, such as operators, kube-dns, flannel & calico, etc.  There is
always non zero chance, that given consumer is hashed  to an apiserver
which is down.

Revert "give the kubernetes service client ip session affinity"
This reverts commit e21ebbc.
@k8s-ci-robot k8s-ci-robot added sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Dec 1, 2017
@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Dec 1, 2017
@dims
Copy link
Member

dims commented Dec 2, 2017

/ok-to-test

/assign @mikedanese
/assign @lavalamp

@enisoc fyi, sounds serious for 1.9 release

@k8s-ci-robot k8s-ci-robot removed the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Dec 2, 2017
@mikedanese
Copy link
Member

I think we don't want this for the reasons mentioned in the original PR. The issue you mention (#22609) should be fixed by #51698 which we are currently transitioning to. cc @rphillips @ncdc

@redbaron
Copy link
Contributor Author

redbaron commented Dec 4, 2017

Lease endpoint reconciler is alpha feature and is not going to be a default reconciler anytime soon, IMHO it cannot be "solution" to this deadlock untils it is default

Original PR mentions hypothetical issues with ordering at the cost of a very real deadlock which at least 3 users experienced, spent time debugging and narrowing down to individual commit and then raising their concerns in that PR comments.

Original PR was merged within a day without any discussion.

Essence of original PR is that answer to following question:

Is having 1/3 of all requests fail for all clients worse then having all requests fail for 1/3 of clients?

was given essentially as "yes, it is better to have all requests of 1/3 users to fail", which is clearly wrong answer, as if you fail all requests for 1/3 of users they have NO CHANCE of recovery, given that there are enough singleton users of kubernetes API this should be obvious.

TLDR; Original PR does more damage than good and should be reverted.

@mikedanese mikedanese assigned wojtek-t and unassigned pmorie and soltysh Dec 4, 2017
@wojtek-t
Copy link
Member

wojtek-t commented Dec 5, 2017

I took some look into things mentioned in this PR. Some of my comments.

IMHO it cannot be "solution" to this deadlock untils it is default

+1

Because each apiserver serves from a cache (unless the linearize read flag is passed explicitly) and each apiserver cache could be a different degree behind etcd.

I don't fully understand this original argument. The requests that are served from apiserver cache are:

  • WATCH (but this is be-design eventual consistency, and I'm not convinced that this matters here)
  • GET, LIST - but ONLY when you explicitly opt-in for it (by setting ResourceVersion param).

And I really hope that all opt-ins for that were done carefully.

What we should fix is the latency of a bad endpoint being remove from the master service.

This was significantly improved since the this was mentioned. So maybe that's no longer that important now?

as if you fail all requests for 1/3 of users they have NO CHANCE of recovery, given that there are enough singleton users of kubernetes API this should be obvious.

+1

Personally, I have a slight preference for merging it (until we will have the new reconciler), but I don't have that strong opinion. I would like to hear opinions from others - including @lavalamp who LGTM-ed original PR and @bowei because I may be missing something.

@mikedanese
Copy link
Member

The PR in question was merge a year and a half ago so I don't think it can be argued that this is urgent. I'm ok with whatever @wojtek-t and @lavalamp say since they are the experts. However this is a significant change that affects all in cluster clients and I am concerned about invalidating testing this late in a release cycle.

@wojtek-t
Copy link
Member

wojtek-t commented Dec 5, 2017

The PR in question was merge a year and a half ago so I don't think it can be argued that this is urgent

+1 - that's why I would like to hear other opinions.
If we will decide that it should be reverted, we may cherrypick it to previous releases if needed.
But that needs to be a conscious decision.

@jsravn
Copy link
Contributor

jsravn commented Dec 5, 2017

Original PR caused an outage for us. We actioned by manually overriding session affinity on the apiserver service. The original PR breaks HA apiservers in my opinion, and should be reverted.

@redbaron
Copy link
Contributor Author

Now that kubecon is over, any chance it can be looked at?:)

@wojtek-t
Copy link
Member

@lavalamp - friendly ping

@redbaron
Copy link
Contributor Author

redbaron commented Feb 6, 2018

@lavalamp , can you have a look and add your opinion on this, please?

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 7, 2018
@redbaron
Copy link
Contributor Author

redbaron commented May 7, 2018

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 7, 2018
@lavalamp
Copy link
Member

lavalamp commented May 7, 2018

The justification I gave for LGTMing the original PR is no longer valid, as consistent reads are on by default now. I think I may have also thought that session affinity was implemented by caching and not by static hashing--I am not sure how it is implemented at the moment.

We should stop using the --master-count mechanism ASAP, though, now that we have something better.

My apologies that you had to wait for an additional kubecon before I noticed this :)

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 7, 2018
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: lavalamp, redbaron

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@kubernetes kubernetes deleted a comment from k8s-github-robot May 7, 2018
@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 7, 2018
@wojtek-t
Copy link
Member

/retest

@redbaron
Copy link
Contributor Author

/test pull-kubernetes-e2e-gce

@k8s-github-robot
Copy link

/test all [submit-queue is verifying that this PR is safe to merge]

@k8s-github-robot
Copy link

Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions here.

@k8s-github-robot k8s-github-robot merged commit 0ba8002 into kubernetes:master May 10, 2018
k8s-github-robot pushed a commit that referenced this pull request Jun 22, 2018
…y_for_1_10

Automatic merge from submit-queue.

[backport] Disable session affinity for internal kubernetes service

Under following conditions session affinity leads to a deadlock:
  - Self hosted controller-manager, where it talks to API servers
    via kubernetes service ClusterIP
  - default master-count reconcilier is used
  - --apiserver-count is set to >1 according to the help message
  - number of responsive APIServers goes below `apiserver-count`
  - all controller-managers happen to be hashed to APIServers which
    are down.

What then happens is that controller managers never be able to
contact APIServer, despite correctly working APIServer available.

Less serious outages also possible for other consumers of kubernetes
service, such as operators, kube-dns, flannel & calico, etc.  There is
always non zero chance, that given consumer is hashed  to an apiserver
which is down.

Revert "give the kubernetes service client ip session affinity"
This reverts commit e21ebbc.

**What this PR does / why we need it**:
Backporting #56690 to 1.10 release branch.

**Which issue(s) this PR fixes** 
Fixes #23129

**Release note**:
```release-note
Disable session affinity for internal kubernetes service - Backport of #56690 to 1.10 release branch
```
k8s-github-robot pushed a commit that referenced this pull request Jun 27, 2018
…690-upstream-release-1.8

Automatic merge from submit-queue.

Automated cherry pick of #56690: Disable session affinity for internal kuberntes service

Cherry pick of #56690 on release-1.8.

#56690: Disable session affinity for internal kuberntes service
k8s-github-robot pushed a commit that referenced this pull request Jul 14, 2018
…690-upstream-release-1.9

Automatic merge from submit-queue.

Automated cherry pick of #56690: Disable session affinity for internal kuberntes service

Cherry pick of #56690 on release-1.9.

#56690: Disable session affinity for internal kuberntes service
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note-none Denotes a PR that doesn't merit a release note. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.