Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NodeController shouldn't evict Pods when all Nodes are NotReady #24597

Closed
gmarek opened this issue Apr 21, 2016 · 7 comments
Closed

NodeController shouldn't evict Pods when all Nodes are NotReady #24597

gmarek opened this issue Apr 21, 2016 · 7 comments
Assignees
Labels
area/nodecontroller priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now.
Milestone

Comments

@gmarek
Copy link
Contributor

gmarek commented Apr 21, 2016

cc @davidopp @lavalamp @mml @roberthbailey @cjcullen @fgrzadkowski

@gmarek gmarek added priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. area/nodecontroller team/control-plane labels Apr 21, 2016
@gmarek gmarek added this to the v1.3 milestone Apr 21, 2016
@lavalamp
Copy link
Member

This is like a (very) rough approximation of the rule "don't evict things that have no place to go". It's obviously super complicated to implement that rule completely, but there's some obvious conditions that we can catch, like this one.

(The motivation here is to avoid having node controller do a bunch of evictions when it and apiserver are segmented from the nodes.)

@davidopp
Copy link
Member

@gmarek Is there some reason you didn't assign this to yourself?

Also, I assume you're going to integrate this into #20979 ?

@gmarek
Copy link
Contributor Author

gmarek commented Apr 22, 2016

No reason, except that I'm not going to start working on it for next few weeks. If someone wants to jump in, I don't want to discourage her/him.

I'd rather keep it separate from #20979 - those are orthogonal changes.

@davidopp
Copy link
Member

@gmarek Will you be able to do this for 1.3, or should we reassign?

@gmarek
Copy link
Contributor Author

gmarek commented May 11, 2016

Depends on the priority. If we want to have a chance for controllerRef for 1.3, then I won't be able to do this. If we're OK with having controllerRef a bit later, then I can.

@davidopp davidopp added priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. and removed priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. labels May 13, 2016
@davidopp
Copy link
Member

To clarify -- this issue is a P0 release blocker for 1.3, because it has caused at least one user to lose data.

k8s-github-robot pushed a commit that referenced this issue May 20, 2016
Automatic merge from submit-queue

NodeController doesn't evict Pods if no Nodes are Ready

Fix #13412 #24597

When NodeControllers don't see any Ready Node it goes into "network segmentation mode". In this mode it cancels all evictions and don't evict any Pods.

It leaves network segmentation mode when it sees at least one Ready Node. When leaving it resets all timers, so each Node has full grace period to reconnect to the cluster.

cc @lavalamp @davidopp @mml @wojtek-t @fgrzadkowski
@roberthbailey
Copy link
Contributor

Looks like this was fixed by #25571.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/nodecontroller priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now.
Projects
None yet
Development

No branches or pull requests

4 participants