-
Notifications
You must be signed in to change notification settings - Fork 40k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Controllers can scale expectations ttl #22619
Comments
would try to do this, if any question, I will leave here. |
See also #22061 |
Issues go stale after 90d of inactivity. Prevent issues from auto-closing with an If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or |
Stale issues rot after 30d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or |
Rotten issues close after 30d of inactivity. Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Currently the ttl is set to: https://github.com/kubernetes/kubernetes/blob/master/pkg/controller/controller_utils.go#L44
In theory it can be infinite, but if by some chance (user error/bug) a pod gets orphaned the controller will wait 5m before giving up. It can give up much sooner in most situations. One way aroud this would be to set the timeout to something like:
numPods/(watches per second) + padding
.Since we currently don't have an accurate way to track watch latency we need to calculate these value empirically. The comment indicates that watches per second is 10 but it should be set based on the theoretical limit of pods in the cluster, which depends on the number of nodes and allowed pods per node.
This probably involves opening up a watch on all nodes in the rc manager and only paying attention to adds/deletes (since node udpates are really frequent). It also involves determining watches per second for buckets like 100 nodes, 500 nodes, 1000 nodes etc.
If we did this, we'd notice a wedged RC in under 10s in the simple case of 3 nodes and a few orphaned pods.
This would be nice to have for 1.2, I'll try to get to it but anyone with cycles should jump in.
The text was updated successfully, but these errors were encountered: