-
Notifications
You must be signed in to change notification settings - Fork 40k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NodeController shouldn't evict Pods when all Nodes are NotReady #24597
Comments
This is like a (very) rough approximation of the rule "don't evict things that have no place to go". It's obviously super complicated to implement that rule completely, but there's some obvious conditions that we can catch, like this one. (The motivation here is to avoid having node controller do a bunch of evictions when it and apiserver are segmented from the nodes.) |
No reason, except that I'm not going to start working on it for next few weeks. If someone wants to jump in, I don't want to discourage her/him. I'd rather keep it separate from #20979 - those are orthogonal changes. |
@gmarek Will you be able to do this for 1.3, or should we reassign? |
Depends on the priority. If we want to have a chance for controllerRef for 1.3, then I won't be able to do this. If we're OK with having controllerRef a bit later, then I can. |
To clarify -- this issue is a P0 release blocker for 1.3, because it has caused at least one user to lose data. |
Automatic merge from submit-queue NodeController doesn't evict Pods if no Nodes are Ready Fix #13412 #24597 When NodeControllers don't see any Ready Node it goes into "network segmentation mode". In this mode it cancels all evictions and don't evict any Pods. It leaves network segmentation mode when it sees at least one Ready Node. When leaving it resets all timers, so each Node has full grace period to reconnect to the cluster. cc @lavalamp @davidopp @mml @wojtek-t @fgrzadkowski
Looks like this was fixed by #25571. |
cc @davidopp @lavalamp @mml @roberthbailey @cjcullen @fgrzadkowski
The text was updated successfully, but these errors were encountered: