Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose flags for new NodeEviction logic in NodeController #30138

Merged
merged 1 commit into from
Aug 18, 2016

Conversation

gmarek
Copy link
Contributor

@gmarek gmarek commented Aug 5, 2016

Fix #28832
Last PR from the NodeController NodeEviction logic series.

cc @davidopp @lavalamp @mml


This change is Reviewable

@gmarek gmarek added the release-note-none Denotes a PR that doesn't merit a release note. label Aug 5, 2016
@gmarek gmarek added this to the v1.4 milestone Aug 5, 2016
@k8s-github-robot k8s-github-robot added kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Aug 5, 2016
@gmarek gmarek assigned davidopp and unassigned wojtek-t Aug 5, 2016
@@ -173,5 +174,10 @@ func (s *CMServer) AddFlags(fs *pflag.FlagSet) {
fs.DurationVar(&s.ControllerStartInterval.Duration, "controller-start-interval", s.ControllerStartInterval.Duration, "Interval between starting controller managers.")
fs.BoolVar(&s.EnableGarbageCollector, "enable-garbage-collector", s.EnableGarbageCollector, "Enables the generic garbage collector. MUST be synced with the corresponding flag of the kube-apiserver. WARNING: the generic garbage collector is an alpha feature.")
fs.Int32Var(&s.ConcurrentGCSyncs, "concurrent-gc-syncs", s.ConcurrentGCSyncs, "The number of garbage collector workers that are allowed to sync concurrently.")
fs.Float32Var(&s.NodeEvictionRate, "node-eviction-rate", 0.1, "Number of nodes per second on which pods are deleted in case of node failure when a zone is healthy.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/healthy./healthy (see --unhealthy-zone-threshold for definition of healthy/unhealthy). Zone refers to entire cluster in non-multizone clusters./

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@gmarek
Copy link
Contributor Author

gmarek commented Aug 8, 2016

@davidopp PTAL

@gmarek
Copy link
Contributor Author

gmarek commented Aug 8, 2016

@k8s-bot unit test this issue: #28713

@gmarek
Copy link
Contributor Author

gmarek commented Aug 8, 2016

ping @davidopp

@k8s-github-robot k8s-github-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 9, 2016
@davidopp
Copy link
Member

davidopp commented Aug 9, 2016

Super-busy right now, will review within the next couple of days.

@k8s-github-robot k8s-github-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 10, 2016
@gmarek gmarek force-pushed the flags branch 2 times, most recently from d8892bb to f94b2fa Compare August 10, 2016 09:58
@k8s-github-robot k8s-github-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 12, 2016
@@ -173,5 +174,10 @@ func (s *CMServer) AddFlags(fs *pflag.FlagSet) {
fs.DurationVar(&s.ControllerStartInterval.Duration, "controller-start-interval", s.ControllerStartInterval.Duration, "Interval between starting controller managers.")
fs.BoolVar(&s.EnableGarbageCollector, "enable-garbage-collector", s.EnableGarbageCollector, "Enables the generic garbage collector. MUST be synced with the corresponding flag of the kube-apiserver. WARNING: the generic garbage collector is an alpha feature.")
fs.Int32Var(&s.ConcurrentGCSyncs, "concurrent-gc-syncs", s.ConcurrentGCSyncs, "The number of garbage collector workers that are allowed to sync concurrently.")
fs.Float32Var(&s.NodeEvictionRate, "node-eviction-rate", 0.1, "Number of nodes per second on which pods are deleted in case of node failure when a zone is healthy (see --unhealthy-zone-threshold for definition of healthy/unhealthy). Zone refers to entire cluster in non-multizone clusters.")
fs.Float32Var(&s.SecondaryNodeEvictionRate, "secondary-node-eviction-rate", 0.01, "Number of nodes per second on which pods are deleted in case of node failure when a zone is unhealthy (see --unhealthy-zone-threshold for definition of healthy/unhealthy). Zone refers to entire cluster in non-multizone clusters. This value is implicitly overridden to 0 if the cluster size is smaller than --large-cluster-size-threshold.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with Daniel that 0.01 is probably too large, especially if --large-cluster-size-threshold is 20.

I think maybe we should change the default for --unhealthy-zone-threshold to 0.55 (this is a rough approximation for "can't reschedule anywhere" if the pods are evenly thread), and change --secondary-node-eviction-rate to 0.002.

We should also mention this new default behavior very prominently in the release notes so we don't get a ton of questions from people about why their pods aren't being evicted.

Actually, it would be great to add some kind of NodeCondition (generated by node controller) that indicates when evictions from that node are being rate-limited due to this new feature. Borg has something like this and it's really helpful for understandability. I would say it's not critical if you can't finish it by Friday (certainly I would not suggest to put it in this PR).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0.002 means that we'll evict ~7 Nodes and hour - I think it's too slow.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For 20 Nodes with 1 Node per 100s it will take over 30 minutes to evict whole cluster. If you think it's too little I'd rather increase the large-cluster-threshold to 40 (it'll give an hour for full eviction) than to decrease rate drastically. This is actually a reason why I wanted to have 'time-to-drain' as a second flag here... @davidpp PTAL.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know... I mean, you probably never really want to evict the whole cluster, so 30 minutes is probably too short... I'm trying to think in terms of an ops team with a pager, what is a reasonable SLO for them to respond and stop the bleeding... But yeah, if you want to change large-cluster-threshold to 40 (or probably 50 would be better) instead of changing the rate, I think it's OK.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made it 50.

@davidopp
Copy link
Member

I made a few small comments. Please add LGTM after you've made the changes.

Do you think the tests are adequate or should there be more?

Please file an issue and assign to yourself for the node condition I suggested in the comments. It would be great to get that in 1.4 but not the end of the world if you can't.

@k8s-github-robot k8s-github-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 16, 2016
@k8s-github-robot k8s-github-robot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Aug 16, 2016
@davidopp
Copy link
Member

BTW what is the PodStatus/ContainerStatus for pods/containers on nodes that are NotReady but not being evicted? (I didn't review your other PRs for this and am too lazy or busy, depending on one's perspective, at the moment to check.)

@gmarek
Copy link
Contributor Author

gmarek commented Aug 17, 2016

I didn't change the logic there, so Pods are marked as NotReady, but containers are left as is (it was this way since I remember)7. I can send another PR that changes this behavior, but I don't want to do this in this PR.

@davidopp
Copy link
Member

That behavior sounds fine, actually.

@gmarek
Copy link
Contributor Author

gmarek commented Aug 17, 2016

I'm trying to check why integration test failed...

@gmarek
Copy link
Contributor Author

gmarek commented Aug 17, 2016

@k8s-bot unit test this issue: #30754

@wojtek-t
Copy link
Member

LGTM

@gmarek gmarek added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 17, 2016
@gmarek
Copy link
Contributor Author

gmarek commented Aug 17, 2016

Self applying LGTM label per #30138 (comment)

@davidopp
Copy link
Member

@k8s-bot unit test this issue: #30788

@gmarek gmarek added the priority/backlog Higher priority than priority/awaiting-more-evidence. label Aug 18, 2016
@k8s-bot
Copy link

k8s-bot commented Aug 18, 2016

GCE e2e build/test passed for commit 4cf698e.

@k8s-github-robot
Copy link

Automatic merge from submit-queue

@k8s-github-robot k8s-github-robot merged commit f9190ed into kubernetes:master Aug 18, 2016
@gmarek gmarek deleted the flags branch August 30, 2016 09:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/backlog Higher priority than priority/awaiting-more-evidence. release-note-none Denotes a PR that doesn't merit a release note. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants