-
Notifications
You must be signed in to change notification settings - Fork 40.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Federation] Implement leader election for controller-manager #46090
[Federation] Implement leader election for controller-manager #46090
Conversation
/assign @nikhiljindal |
Sorry I havent had a chance to go through the design yet. If there is anyone else who feels comfortable reviewing this, please go ahead. |
return | ||
} else if !send { | ||
glog.V(5).Infof("Skipping cluster: %s for namespace: %s reason: cluster selectors do not match: %-v %-v", cluster.Name, namespace, cluster.ObjectMeta.Labels, desiredNamespace.ObjectMeta.Annotations[federationapi.FederationClusterSelectorAnnotation]) | ||
continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This causes you to skip over the switch statement below, and in particular the "case found && !send" so you do not end up deleting the namespace in an underlying cluster when you should.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oops, a mistake. will fix. Thanks for catching this.
14a9426
to
e55d757
Compare
@@ -54,6 +58,13 @@ import ( | |||
"k8s.io/client-go/discovery" | |||
) | |||
|
|||
const ( | |||
apiserverWaitTimeout = 30 * 2 * time.Second |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Why not just 60 seconds, or one minute?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I kind of forgot why i did it like that. will change.
@@ -54,6 +58,13 @@ import ( | |||
"k8s.io/client-go/discovery" | |||
) | |||
|
|||
const ( | |||
apiserverWaitTimeout = 30 * 2 * time.Second | |||
apiserverRetryInterval = 1 * time.Second |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems very short, and requires exponential backoff. There are standard libraries to do the right thing here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should be initializing the controller-manager as early as possible. controller-manager depends on api-server and they both are started independently. So either one may start functioning before the other. If we introduce exponential backoff, i am worried it may induce unnecessary delay in initializing controller-manager. Also it is not the critical path and we are just in initialization stage.
I would increase the apiserverRetryInterval to 2 seconds. and the apiserverWaitTimeout to 2 minutes. Is this fine?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
apiserverWaitTimeout = 30 * 2 * time.Second | ||
apiserverRetryInterval = 1 * time.Second | ||
|
||
federationOnlyNamespace = "federation" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to be configurable, and should be an explicitly reserved name. As it stands right now, if the user has a namespace called federation in any of their underlying clusters, these will be automatically deleted (because what you're saying here is that you only want it in the federation, not in any of the underlying clusters).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with your comment. I would make it configurable. Also what could be the name for such a namespace in case of default. is federation
name fine? or should i changed it to something like federation-only
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think federation-only is fine as a default.
glog.Fatalf("Invalid API configuration: %v", err) | ||
} | ||
|
||
if err := createFederationNamespace(federationClientset, federationOnlyNamespace); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should only do this if the user has requested an HA federation control plane, otherwise you have an unnecessary namespace hanging around, for no reason.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with the concern, but even in case of non-HA federation control plane, it has one purpose to wait for the Api Server and to establish it is working fine. In the current scenario, without this. the federation-controller-manager dies and another pod is brought up. This is documented in this issue #44490
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now creating the federation-only namespace only in HA scenario. PTAL
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, thanks.
@@ -236,3 +256,33 @@ func hasRequiredResources(serverResources []*metav1.APIResourceList, requiredRes | |||
} | |||
return true | |||
} | |||
|
|||
func createFederationNamespace(clientset *federationclientset.Clientset, namespace string) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be called "ensureFederationNamespace" because that's what it does. If the namespace already exists, it does nothing, otherwise it tries to create it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, will change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks.
|
||
if !s.LeaderElection.LeaderElect { | ||
run(nil) | ||
panic("unreachable") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why this is needed. It's already in run()?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agree, its an overkill. will remove it.
Callbacks: leaderelection.LeaderCallbacks{ | ||
OnStartedLeading: run, | ||
OnStoppedLeading: func() { | ||
glog.Fatalf("leaderelection lost") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is normal, expected behaviour, right? In that case we should not be logging a fatal error here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So you die if you lose the leader election here? Are relying on Kubernetes to automatically restart this pod to try again? This seems to introduce potential delays and also it seems brittle. What if Kubernetes decides to stop restarting this pod because it's failed too often? And what if, because this pod is dead, Kubernetes decides to schedule another pod on this node, and uses up all the resources? I guess in summary I don't understand the design for how leader election is supposed to work. Can you point me to the design doc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if the instance which is leader loses leader election it would die and restart. potentially it has encountered some issues due to which it could not renew the leadership lease. Also it should be ok if it restarts as there are standby instances which could acquire lease immediately after one is available.
@quinton-hoole, there was no design for leader-election for controller-manager as it was thought to be trivial. Already there are multiple examples available on how to implement it. here is the one for kube-controller-manager
However there is Federation control plane HA design, which we need to revisit in v1.8 and gather acceptance from community. https://docs.google.com/document/d/18kB5W401WBhEjUOOXQnBb-nx3bFXK8pOCHbMbdXuBMg/edit?usp=sharing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, that makes sense. If the kubelet doesn't restart the pod, then the replicset controller will, perhaps on a different node. Sorry, I was being dumb.
@shashidharatd Is the plan still to merge this, or can it be closed? |
e55d757
to
47f9745
Compare
@quinton-hoole, @marun, I have reworked and addressed the comments. can you PTAL. |
/lgtm |
/retest Review the full test history for this PR. |
2 similar comments
/retest Review the full test history for this PR. |
/retest Review the full test history for this PR. |
Looks like a incompatible change is in submit-queue and compilation is failing in batch mode. So removing the lgtm label to allow other PR's to merge and later will rebase and fix the issue. /lgtm cancel |
/retest Review the full test history for this PR. |
9 similar comments
/retest Review the full test history for this PR. |
/retest Review the full test history for this PR. |
/retest Review the full test history for this PR. |
/retest Review the full test history for this PR. |
/retest Review the full test history for this PR. |
/retest Review the full test history for this PR. |
/retest Review the full test history for this PR. |
/retest Review the full test history for this PR. |
/retest Review the full test history for this PR. |
@krzyzacy, pull-kubernetes-verify is keep on failing on build timeout of 60 mins. As i checked this job is configured with timeout of 75 mins now. Any idea why it is still timing out at 60 mins? |
/test pull-kubernetes-verify |
ah |
refreshed the config manually, will drop another retest |
/test pull-kubernetes-verify |
82e84f8
to
e6b54b6
Compare
/test pull-kubernetes-verify |
@irfanurrehman, Can you please lgtm this PR. I just did rebase after the previous lgtm from @quinton-hoole for removing the flag in |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: irfanurrehman, quinton-hoole, shashidharatd Associated issue: 44631 The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these OWNERS Files:
You can indicate your approval by writing |
/test all [submit-queue is verifying that this PR is safe to merge] |
Automatic merge from submit-queue |
What this PR does / why we need it:
Ref: #44283
Which issue this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close that issue when PR gets merged): fixes #44490Special notes for your reviewer:
This PR also fixes the issue #44490, which is about delay in initializing controller-manager due to unavailability of api-server.
Release note:
/cc @kubernetes/sig-federation-pr-reviews