Master/node clock skew will affect deployment's pod availability check #29229

janetkuo · 2016-07-19T21:17:47Z

When a deployment controller checks if a pod is available, it compares master's time (time.Now()) against node's time (pod.Status.Conditions[].LastTransitionTime + deployment.spec.minReadySeconds). If the master's and the node's clocks are off by more than seconds, the pod availability check wouldn't work as expected.

@kubernetes/deployment

The text was updated successfully, but these errors were encountered:

adohe-zz · 2016-07-20T00:21:27Z

Real clocks or physical time are not perfectly accurate even time synchronization is on. Some kind of virtual time could be better.

0xmichalis · 2016-07-20T08:15:11Z

cc @smarterclayton

smarterclayton · 2016-07-20T18:03:15Z

The node should have access to the clock skew of the master via the date header returned on responses. It's likely that in the future we'd want a story to deal with node/master clock skew consistently. We have ve said you must run your cluster with low time skew.

I think in the short term the deployment controller should create a virtual clock (like the node controller and namespace controller do) for dealing with timestamps. The deployment controller can at least bracket between creation time and pod accepted by kubelet time and start the clock when it observes the pod. I think we want to remove any time.Now() comparisons and replace them with observed clock skew from the master's reference frame.

dchen1107 · 2016-07-28T18:35:55Z

xref: #6159

0xmichalis · 2016-09-20T11:15:15Z

An alternative to virtual clocks is minReadySeconds on the pod level (@bgrant0607 has already asked for it).

bgrant0607 · 2016-09-28T02:59:12Z

Note that the node controller already had to deal with this. IIRC, it didn't actually use the timestamps in the condition, but started its own timers when observing transitions.

smarterclayton · 2016-09-28T16:50:32Z

The kubelet, namespace, and node controller also do that for graceful
deletion - starting timers when they observe the state change, but not
treating the timestamp in the object as meaningful.

On Tue, Sep 27, 2016 at 10:59 PM, Brian Grant notifications@github.com
wrote:

Note that the node controller already had to deal with this. IIRC, it
didn't actually use the timestamps in the condition, but started its own
timers when observing transitions.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#29229 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABG_p5OkpUvUjF8yI8xXKlTj8KpvprNbks5qudgOgaJpZM4JQLjm
.

0xmichalis · 2016-11-03T17:07:30Z

We need a virtual clock in the deployment controller also for deployment conditions. See #35691 (comment) for more context.

erictune · 2016-11-03T17:53:14Z

I like the option of a private timer in the deployment controller.

We think we have seen this in the wild. Too late to fix in 1.5. Targeting 1.6.

0xmichalis · 2016-12-08T13:55:12Z

FWIW, the availability check "moved" in the replica set controller, meaning we should add the virtual clock in the replica set controller if we want to solve this by using a virtual clock. I guess all controllers that deal with Pods should use virtual clocks in the long term.

smarterclayton · 2016-12-09T15:22:29Z

We probably want to tie that clock as well to delaying work queue and others. On Dec 8, 2016, at 8:55 AM, Michail Kargakis <notifications@github.com> wrote: FWIW, the availability check "moved" in the replica set controller, meaning we should add the virtual clock in the replica set controller if we want to solve this by using a virtual clock. I guess all controllers that deal with Pods should use virtual clocks in the long term. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#29229 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABG_p716MtV22fowPKErnnwmSiy4xR8Hks5rGAxHgaJpZM4JQLjm> .

0xmichalis · 2017-01-01T05:14:41Z

An alternative to virtual clocks is minReadySeconds on the pod level

I've got kubernetes/community#194 up for review (proposal for adding minReadySeconds to pods) and I have a preliminary buggy working implementation of it locally but the concerns raised by @thockin regarding the ready condition being ambiguous already got me thinking of following the simplest solution possible ie. using a virtual clock in the replica set controller.

fejta-bot · 2017-12-24T22:36:36Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

Prevent issues from auto-closing with an /lifecycle frozen comment.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or @fejta.
/lifecycle stale

erictune · 2018-01-16T23:51:31Z

/remove-lifecycle stale

fejta-bot · 2018-04-17T00:49:51Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot · 2018-05-17T01:37:15Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten
/remove-lifecycle stale

fejta-bot · 2018-06-16T02:23:57Z

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

janetkuo added area/app-lifecycle team/ux area/workload-api/deployment labels Jul 19, 2016

janetkuo mentioned this issue Jul 19, 2016

Detect clock skew vs the master or some other central source kubernetes/node-problem-detector#26

Closed

pwittrock added the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label Jul 21, 2016

0xmichalis mentioned this issue Sep 16, 2016

Add perma-failed deployments API #19343

Merged

0xmichalis self-assigned this Oct 17, 2016

davidopp mentioned this issue Oct 18, 2016

WIP: implement forgiveness phase 1 #34825

Closed

8 tasks

0xmichalis mentioned this issue Nov 3, 2016

Controller changes for perma failed deployments #35691

Merged

erictune added this to the 1.6 milestone Nov 3, 2016

saad-ali modified the milestones: v1.6, 1.6 Nov 12, 2016

0xmichalis mentioned this issue Dec 5, 2016

[k8s.io] Deployment deployment should support rollover {Kubernetes e2e suite} #35355

Closed

0xmichalis mentioned this issue Dec 19, 2016

Proposal: Introduce Available Pods (MinReadySeconds in the PodSpec) kubernetes/community#194

Closed

0xmichalis added area/workload-api/replicaset and removed team/cluster (deprecated - do not use) labels Jan 1, 2017

0xmichalis mentioned this issue Jan 1, 2017

PodDisruptionBudget maxUnavailable #34776

Closed

0xmichalis added the area/workload-api/deployment label Jan 11, 2017

0xmichalis removed their assignment Jan 11, 2017

0xmichalis mentioned this issue Feb 16, 2017

DaemonSet updates - take 2 #41116

Merged

0xmichalis mentioned this issue Mar 2, 2017

Enqueue controllers after minreadyseconds when all pods are ready #42236

Merged

0xmichalis modified the milestones: v1.7, v1.6 Mar 2, 2017

bgrant0607 mentioned this issue Mar 21, 2017

Workload API v1 requirements umbrella issue #42752

Closed

0xmichalis added area/workload-api/daemonset sig/apps Categorizes an issue or PR as relevant to SIG Apps. and removed area/workload-api/deployment labels Apr 3, 2017

0xmichalis modified the milestones: next-candidate, v1.7 May 21, 2017

0xmichalis mentioned this issue Nov 6, 2017

startTime in prowjobs should be set when a controller switches the pj to pending kubernetes/test-infra#5292

Closed

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 24, 2017

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 16, 2018

johnugeorge mentioned this issue Mar 30, 2018

Consider the clock skew when resource creation time is compared knative/serving#561

Closed

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 17, 2018

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels May 17, 2018

k8s-ci-robot closed this as completed Jun 16, 2018

enisoc mentioned this issue Jul 9, 2018

Support MinReadySeconds on StatefulSet #65098

Closed

github-actions bot mentioned this issue Dec 14, 2020

MinReadySeconds in the Pod will generate an Available condition to be added in pacoxu/kubernetes#210

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Master/node clock skew will affect deployment's pod availability check #29229

Master/node clock skew will affect deployment's pod availability check #29229

janetkuo commented Jul 19, 2016 •

edited by 0xmichalis

Loading

adohe-zz commented Jul 20, 2016

0xmichalis commented Jul 20, 2016

smarterclayton commented Jul 20, 2016

dchen1107 commented Jul 28, 2016

0xmichalis commented Sep 20, 2016

bgrant0607 commented Sep 28, 2016

smarterclayton commented Sep 28, 2016

0xmichalis commented Nov 3, 2016 •

edited

Loading

erictune commented Nov 3, 2016

0xmichalis commented Dec 8, 2016

smarterclayton commented Dec 9, 2016 via email

0xmichalis commented Jan 1, 2017

fejta-bot commented Dec 24, 2017

erictune commented Jan 16, 2018

fejta-bot commented Apr 17, 2018

fejta-bot commented May 17, 2018

fejta-bot commented Jun 16, 2018

Master/node clock skew will affect deployment's pod availability check #29229

Master/node clock skew will affect deployment's pod availability check #29229

Comments

janetkuo commented Jul 19, 2016 • edited by 0xmichalis Loading

adohe-zz commented Jul 20, 2016

0xmichalis commented Jul 20, 2016

smarterclayton commented Jul 20, 2016

dchen1107 commented Jul 28, 2016

0xmichalis commented Sep 20, 2016

bgrant0607 commented Sep 28, 2016

smarterclayton commented Sep 28, 2016

0xmichalis commented Nov 3, 2016 • edited Loading

erictune commented Nov 3, 2016

0xmichalis commented Dec 8, 2016

smarterclayton commented Dec 9, 2016 via email

0xmichalis commented Jan 1, 2017

fejta-bot commented Dec 24, 2017

erictune commented Jan 16, 2018

fejta-bot commented Apr 17, 2018

fejta-bot commented May 17, 2018

fejta-bot commented Jun 16, 2018

janetkuo commented Jul 19, 2016 •

edited by 0xmichalis

Loading

0xmichalis commented Nov 3, 2016 •

edited

Loading