Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Federation] Implement leader election for controller-manager #46090

Merged
merged 2 commits into from
Aug 10, 2017

Conversation

shashidharatd
Copy link

@shashidharatd shashidharatd commented May 19, 2017

What this PR does / why we need it:

Ref: #44283

Which issue this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close that issue when PR gets merged): fixes #44490

Special notes for your reviewer:
This PR also fixes the issue #44490, which is about delay in initializing controller-manager due to unavailability of api-server.

Release note:

federation: Support for leader-election among federation controller-manager instances introduced.

/cc @kubernetes/sig-federation-pr-reviews

@k8s-ci-robot k8s-ci-robot added sig/federation cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels May 19, 2017
@k8s-github-robot k8s-github-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. release-note Denotes a PR that will be considered when it comes time to generate release notes. labels May 19, 2017
@shashidharatd
Copy link
Author

/assign @nikhiljindal
/assign @marun

@shashidharatd shashidharatd changed the title [Federation] Impletement leader election for controller-manager [Federation] Implement leader election for controller-manager May 19, 2017
@nikhiljindal
Copy link
Contributor

Sorry I havent had a chance to go through the design yet.
Given that these are last few days of the release, I dont think I will be able to devote much time to this.

If there is anyone else who feels comfortable reviewing this, please go ahead.

@nikhiljindal nikhiljindal removed their assignment May 26, 2017
@marun marun self-requested a review May 31, 2017 21:29
return
} else if !send {
glog.V(5).Infof("Skipping cluster: %s for namespace: %s reason: cluster selectors do not match: %-v %-v", cluster.Name, namespace, cluster.ObjectMeta.Labels, desiredNamespace.ObjectMeta.Annotations[federationapi.FederationClusterSelectorAnnotation])
continue
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This causes you to skip over the switch statement below, and in particular the "case found && !send" so you do not end up deleting the namespace in an underlying cluster when you should.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops, a mistake. will fix. Thanks for catching this.

@@ -54,6 +58,13 @@ import (
"k8s.io/client-go/discovery"
)

const (
apiserverWaitTimeout = 30 * 2 * time.Second
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Why not just 60 seconds, or one minute?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kind of forgot why i did it like that. will change.

@@ -54,6 +58,13 @@ import (
"k8s.io/client-go/discovery"
)

const (
apiserverWaitTimeout = 30 * 2 * time.Second
apiserverRetryInterval = 1 * time.Second
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems very short, and requires exponential backoff. There are standard libraries to do the right thing here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be initializing the controller-manager as early as possible. controller-manager depends on api-server and they both are started independently. So either one may start functioning before the other. If we introduce exponential backoff, i am worried it may induce unnecessary delay in initializing controller-manager. Also it is not the critical path and we are just in initialization stage.
I would increase the apiserverRetryInterval to 2 seconds. and the apiserverWaitTimeout to 2 minutes. Is this fine?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

apiserverWaitTimeout = 30 * 2 * time.Second
apiserverRetryInterval = 1 * time.Second

federationOnlyNamespace = "federation"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be configurable, and should be an explicitly reserved name. As it stands right now, if the user has a namespace called federation in any of their underlying clusters, these will be automatically deleted (because what you're saying here is that you only want it in the federation, not in any of the underlying clusters).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with your comment. I would make it configurable. Also what could be the name for such a namespace in case of default. is federation name fine? or should i changed it to something like federation-only

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think federation-only is fine as a default.

glog.Fatalf("Invalid API configuration: %v", err)
}

if err := createFederationNamespace(federationClientset, federationOnlyNamespace); err != nil {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should only do this if the user has requested an HA federation control plane, otherwise you have an unnecessary namespace hanging around, for no reason.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with the concern, but even in case of non-HA federation control plane, it has one purpose to wait for the Api Server and to establish it is working fine. In the current scenario, without this. the federation-controller-manager dies and another pod is brought up. This is documented in this issue #44490

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now creating the federation-only namespace only in HA scenario. PTAL

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, thanks.

@@ -236,3 +256,33 @@ func hasRequiredResources(serverResources []*metav1.APIResourceList, requiredRes
}
return true
}

func createFederationNamespace(clientset *federationclientset.Clientset, namespace string) error {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be called "ensureFederationNamespace" because that's what it does. If the namespace already exists, it does nothing, otherwise it tries to create it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, will change

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.


if !s.LeaderElection.LeaderElect {
run(nil)
panic("unreachable")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why this is needed. It's already in run()?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree, its an overkill. will remove it.

Callbacks: leaderelection.LeaderCallbacks{
OnStartedLeading: run,
OnStoppedLeading: func() {
glog.Fatalf("leaderelection lost")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is normal, expected behaviour, right? In that case we should not be logging a fatal error here.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So you die if you lose the leader election here? Are relying on Kubernetes to automatically restart this pod to try again? This seems to introduce potential delays and also it seems brittle. What if Kubernetes decides to stop restarting this pod because it's failed too often? And what if, because this pod is dead, Kubernetes decides to schedule another pod on this node, and uses up all the resources? I guess in summary I don't understand the design for how leader election is supposed to work. Can you point me to the design doc?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the instance which is leader loses leader election it would die and restart. potentially it has encountered some issues due to which it could not renew the leadership lease. Also it should be ok if it restarts as there are standby instances which could acquire lease immediately after one is available.
@quinton-hoole, there was no design for leader-election for controller-manager as it was thought to be trivial. Already there are multiple examples available on how to implement it. here is the one for kube-controller-manager

However there is Federation control plane HA design, which we need to revisit in v1.8 and gather acceptance from community. https://docs.google.com/document/d/18kB5W401WBhEjUOOXQnBb-nx3bFXK8pOCHbMbdXuBMg/edit?usp=sharing

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, that makes sense. If the kubelet doesn't restart the pod, then the replicset controller will, perhaps on a different node. Sorry, I was being dumb.

@k8s-github-robot k8s-github-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 23, 2017
@ghost
Copy link

ghost commented Jul 17, 2017

@shashidharatd Is the plan still to merge this, or can it be closed?

@k8s-github-robot k8s-github-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 18, 2017
@shashidharatd
Copy link
Author

@quinton-hoole, @marun, I have reworked and addressed the comments. can you PTAL.

@ghost
Copy link

ghost commented Aug 3, 2017

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 3, 2017
@k8s-github-robot k8s-github-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 3, 2017
@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to @fejta).

Review the full test history for this PR.

2 similar comments
@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to @fejta).

Review the full test history for this PR.

@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to @fejta).

Review the full test history for this PR.

@shashidharatd
Copy link
Author

Looks like a incompatible change is in submit-queue and compilation is failing in batch mode. So removing the lgtm label to allow other PR's to merge and later will rebase and fix the issue.
https://k8s-gubernator.appspot.com/build/kubernetes-jenkins/pr-logs/pull/batch/pull-kubernetes-federation-e2e-gce/18804/?log#log

/lgtm cancel

@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to @fejta).

Review the full test history for this PR.

9 similar comments
@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to @fejta).

Review the full test history for this PR.

@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to @fejta).

Review the full test history for this PR.

@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to @fejta).

Review the full test history for this PR.

@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to @fejta).

Review the full test history for this PR.

@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to @fejta).

Review the full test history for this PR.

@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to @fejta).

Review the full test history for this PR.

@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to @fejta).

Review the full test history for this PR.

@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to @fejta).

Review the full test history for this PR.

@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to @fejta).

Review the full test history for this PR.

@shashidharatd
Copy link
Author

@krzyzacy, pull-kubernetes-verify is keep on failing on build timeout of 60 mins. As i checked this job is configured with timeout of 75 mins now. Any idea why it is still timing out at 60 mins?

@krzyzacy
Copy link
Member

krzyzacy commented Aug 9, 2017

/test pull-kubernetes-verify

@krzyzacy
Copy link
Member

krzyzacy commented Aug 9, 2017

ah kubernetes-update-jenkins-jobs has stopped running for whatever reason so the config was not actually applied...

@krzyzacy
Copy link
Member

krzyzacy commented Aug 9, 2017

refreshed the config manually, will drop another retest

@krzyzacy
Copy link
Member

krzyzacy commented Aug 9, 2017

/test pull-kubernetes-verify

@k8s-github-robot k8s-github-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 9, 2017
@k8s-github-robot k8s-github-robot removed lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Aug 10, 2017
@shashidharatd
Copy link
Author

/test pull-kubernetes-verify

@shashidharatd
Copy link
Author

@irfanurrehman, Can you please lgtm this PR. I just did rebase after the previous lgtm from @quinton-hoole for removing the flag in hack/verfiy-flags/known-flag.txt

@irfanurrehman
Copy link
Contributor

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 10, 2017
@k8s-github-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: irfanurrehman, quinton-hoole, shashidharatd

Associated issue: 44631

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@k8s-github-robot
Copy link

/test all [submit-queue is verifying that this PR is safe to merge]

@k8s-github-robot
Copy link

Automatic merge from submit-queue

@k8s-github-robot k8s-github-robot merged commit d6c40d6 into kubernetes:master Aug 10, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

federation: federation-controller-manager is restarted during init
8 participants