-
Notifications
You must be signed in to change notification settings - Fork 40k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NodeController doesn't evict Pods if no Nodes are Ready #25571
Conversation
@@ -63,6 +63,11 @@ type UniqueQueue struct { | |||
set sets.String | |||
} | |||
|
|||
// Length returns a number of items currently in the queue | |||
func (q *UniqueQueue) Length() int { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks unsafe re: locking. Please don't add this, esp don't make it public.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
looks OK to me. |
Ping me when you get tests working. :) |
@lavalamp tests are passing now |
I guess that @lavalamp meant the tests I mentioned in previous comment. There's no e2e tests for this and I didn't have time to write those. I'll hand test it today and add tests sometime next week. |
ac2be30
to
59a83a4
Compare
@lavalamp I run manual tests and it seems to work now (I had a bug - now I skip master node when looking for Ready Nodes). |
Can you exercise your bugfix in the unit test? then LGTM. |
There is a unit test for this - the last case I added checks exactly this. |
@k8s-bot test this issue: #IGNORE |
Sorry, maybe I'm blind, but I can't see a test case with a node having a name ending in "-master"? |
Oh, sorry - the master thing. I miss-read your comment. |
@lavalamp PTAL |
LGTM! |
It's pretty high in the queue, but we might need to merge it by hand if it won't make it until tomorrow. |
@k8s-bot test this [submit-queue is verifying that this PR is safe to merge] |
GCE e2e build/test passed for commit 6d27009. |
Automatic merge from submit-queue |
This looks like it might also resolve the scenario described in #19124. Will it be backported to 1.2 as well? |
@pesho No. But 1.3 should be out relatively soon (less than a month). |
I am seeing this error in the kubelet logs:
All pods in the node were rescheduled to other nodes. Is this how this is supposed to behave? Also, did it reschedule without doing few retries at the least? |
@manojlds - yes it is is as expected, and NC does a number of retries before it evicts Pods (by default 30 times, but it depends on how often NodeStatus is reported, and what are the settings for PodEvictionGracePeriod). From what you wrote there were other Nodes that were serving, so the behavior seems to be correct, or did you expect something else? This log line suggests that the cloud provider became unreachable from some of your Nodes. @yujuhong |
@gmarek - thanks. This is the only instance of the error I could see in the log, so not sure if it did retry before actually evicting the pods. The behaviour seems fine, only doubt/question is on the retry. |
Even if the kubelet does not retry it's fine, as NC requires a lot of failed NodeStatus updates to actually start doing something. For kubelet-side of this @yujuhong should be able to answer/point you to someone who can. |
Kubelet should retry up to 5 times by default. On top of that, kubelet tries to update the node status every 10s (or a customized period passed by flags/configs), and node controller only reacts in O(min). One failed attempt should not cause any serious reactions. |
@yujuhong thanks. Will take a look at the flags, and see how best I can use them in my cluster. |
Fix #13412 #24597
When NodeControllers don't see any Ready Node it goes into "network segmentation mode". In this mode it cancels all evictions and don't evict any Pods.
It leaves network segmentation mode when it sees at least one Ready Node. When leaving it resets all timers, so each Node has full grace period to reconnect to the cluster.
cc @lavalamp @davidopp @mml @wojtek-t @fgrzadkowski