Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Controllers can scale expectations ttl #22619

Closed
bprashanth opened this issue Mar 7, 2016 · 5 comments
Closed

Controllers can scale expectations ttl #22619

bprashanth opened this issue Mar 7, 2016 · 5 comments
Labels
area/controller-manager lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. sig/apps Categorizes an issue or PR as relevant to SIG Apps.

Comments

@bprashanth
Copy link
Contributor

Currently the ttl is set to: https://github.com/kubernetes/kubernetes/blob/master/pkg/controller/controller_utils.go#L44

In theory it can be infinite, but if by some chance (user error/bug) a pod gets orphaned the controller will wait 5m before giving up. It can give up much sooner in most situations. One way aroud this would be to set the timeout to something like: numPods/(watches per second) + padding.

Since we currently don't have an accurate way to track watch latency we need to calculate these value empirically. The comment indicates that watches per second is 10 but it should be set based on the theoretical limit of pods in the cluster, which depends on the number of nodes and allowed pods per node.

This probably involves opening up a watch on all nodes in the rc manager and only paying attention to adds/deletes (since node udpates are really frequent). It also involves determining watches per second for buckets like 100 nodes, 500 nodes, 1000 nodes etc.

If we did this, we'd notice a wedged RC in under 10s in the simple case of 3 nodes and a few orphaned pods.

This would be nice to have for 1.2, I'll try to get to it but anyone with cycles should jump in.

@adohe-zz
Copy link

adohe-zz commented Mar 7, 2016

would try to do this, if any question, I will leave here.

@bgrant0607 bgrant0607 added the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label Mar 10, 2016
@bgrant0607
Copy link
Member

See also #22061

@bgrant0607 bgrant0607 added sig/apps Categorizes an issue or PR as relevant to SIG Apps. and removed team/control-plane (deprecated - do not use) labels Mar 8, 2017
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

Prevent issues from auto-closing with an /lifecycle frozen comment.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or @fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 21, 2017
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or @fejta.
/lifecycle rotten
/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jan 20, 2018
@fejta-bot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/controller-manager lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. sig/apps Categorizes an issue or PR as relevant to SIG Apps.
Projects
None yet
Development

No branches or pull requests

5 participants