Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Master/node clock skew will affect deployment's pod availability check #29229

Closed
janetkuo opened this issue Jul 19, 2016 · 17 comments
Closed

Master/node clock skew will affect deployment's pod availability check #29229

janetkuo opened this issue Jul 19, 2016 · 17 comments
Labels
area/app-lifecycle area/workload-api/daemonset area/workload-api/replicaset lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. sig/apps Categorizes an issue or PR as relevant to SIG Apps.

Comments

@janetkuo
Copy link
Member

janetkuo commented Jul 19, 2016

Found this in: #26834 (comment)

When a deployment controller checks if a pod is available, it compares master's time (time.Now()) against node's time (pod.Status.Conditions[].LastTransitionTime + deployment.spec.minReadySeconds). If the master's and the node's clocks are off by more than seconds, the pod availability check wouldn't work as expected.

@kubernetes/deployment

@adohe-zz
Copy link

Real clocks or physical time are not perfectly accurate even time synchronization is on. Some kind of virtual time could be better.

@0xmichalis
Copy link
Contributor

cc @smarterclayton

@smarterclayton
Copy link
Contributor

The node should have access to the clock skew of the master via the date header returned on responses. It's likely that in the future we'd want a story to deal with node/master clock skew consistently. We have ve said you must run your cluster with low time skew.

I think in the short term the deployment controller should create a virtual clock (like the node controller and namespace controller do) for dealing with timestamps. The deployment controller can at least bracket between creation time and pod accepted by kubelet time and start the clock when it observes the pod. I think we want to remove any time.Now() comparisons and replace them with observed clock skew from the master's reference frame.

@pwittrock pwittrock added the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label Jul 21, 2016
@dchen1107
Copy link
Member

xref: #6159

@0xmichalis
Copy link
Contributor

An alternative to virtual clocks is minReadySeconds on the pod level (@bgrant0607 has already asked for it).

@bgrant0607
Copy link
Member

Note that the node controller already had to deal with this. IIRC, it didn't actually use the timestamps in the condition, but started its own timers when observing transitions.

@smarterclayton
Copy link
Contributor

The kubelet, namespace, and node controller also do that for graceful
deletion - starting timers when they observe the state change, but not
treating the timestamp in the object as meaningful.

On Tue, Sep 27, 2016 at 10:59 PM, Brian Grant notifications@github.com
wrote:

Note that the node controller already had to deal with this. IIRC, it
didn't actually use the timestamps in the condition, but started its own
timers when observing transitions.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#29229 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABG_p5OkpUvUjF8yI8xXKlTj8KpvprNbks5qudgOgaJpZM4JQLjm
.

@0xmichalis
Copy link
Contributor

0xmichalis commented Nov 3, 2016

We need a virtual clock in the deployment controller also for deployment conditions. See #35691 (comment) for more context.

@erictune
Copy link
Member

erictune commented Nov 3, 2016

I like the option of a private timer in the deployment controller.

We think we have seen this in the wild. Too late to fix in 1.5. Targeting 1.6.

@0xmichalis
Copy link
Contributor

FWIW, the availability check "moved" in the replica set controller, meaning we should add the virtual clock in the replica set controller if we want to solve this by using a virtual clock. I guess all controllers that deal with Pods should use virtual clocks in the long term.

@smarterclayton
Copy link
Contributor

smarterclayton commented Dec 9, 2016 via email

@0xmichalis
Copy link
Contributor

An alternative to virtual clocks is minReadySeconds on the pod level

I've got kubernetes/community#194 up for review (proposal for adding minReadySeconds to pods) and I have a preliminary buggy working implementation of it locally but the concerns raised by @thockin regarding the ready condition being ambiguous already got me thinking of following the simplest solution possible ie. using a virtual clock in the replica set controller.

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

Prevent issues from auto-closing with an /lifecycle frozen comment.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or @fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 24, 2017
@erictune
Copy link
Member

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 16, 2018
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 17, 2018
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten
/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels May 17, 2018
@fejta-bot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/app-lifecycle area/workload-api/daemonset area/workload-api/replicaset lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. sig/apps Categorizes an issue or PR as relevant to SIG Apps.
Projects
None yet
Development

No branches or pull requests