Tune down initialDelaySeconds for readinessProbe. #33146

MrHohn · 2016-09-21T00:58:01Z

Tuned down the initialDelaySeconds(original 30s) for readiness probe to 3 seconds and periodSeconds(default 10s) to 5 seconds to shorten the initial time before a dns server pod being exposed. This configuration passed DNS e2e tests and did not even hit any readiness failure(for kube-dns) with a GCE cluster with 4 nodes during the experiments.

For scaling out kube-dns servers, it took less than 10s for servers being exposed after they appeared as running, which is much faster than 30+s(the original cost).

failureThreshold is left as default(3) and it would not lead to restart because the status of readiness probe would only affect whether endpoints being exposed in service or not(in the dns service point of view). According to the implementation of prober, the number of retries for readiness probe is unbounded. Hence there is no obvious effect if the readiness probe fail several times in the beginning.

The state machine of prober could be illustrated with below figure:

I want to see the e2e result of this PR for further evaluation.

Tune down initialDelaySeconds for readinessProbe.

@thockin @bprashanth

This change is

MrHohn · 2016-09-21T03:03:44Z

@k8s-bot kubemark test this issue: #32440

MrHohn · 2016-09-21T17:44:39Z

It seems like the Readiness probe succeeded at the first attempt.
I0921 01:17:45.553582 3702 prober.go:115] Readiness probe for "kube-dns-v19-ihzyg_kube-system(ff2c59c7-7f98-11e6-9525-42010af00002):kubedns" succeeded

And endpoints being detected and exposed in around 40ms after that.
I0921 01:17:45.591768 7 proxier.go:573] Setting endpoints for "kube-system/kube-dns:dns" to [10.180.3.5:53]
I0921 01:17:45.591824 7 proxier.go:573] Setting endpoints for "kube-system/kube-dns:dns-tcp" to [10.180.3.5:53]

The total titme is ~25s since kubelete received this pod.
I0921 01:17:20.543855 3702 config.go:397] Receiving a new pod "kube-dns-v19-ihzyg_kube-system(ff2c59c7-7f98-11e6-9525-42010af00002)"

And it took ~22s to start all three container(initialDelaySeconds start after this).
I0921 01:17:36.493080 3702 server.go:611] Event(api.ObjectReference{Kind:"Pod", Namespace:"kube-system", Name:"kube-dns-v19-ihzyg", UID:"ff2c59c7-7f98-11e6-9525-42010af00002", APIVersion:"v1", ResourceVersion:"319", FieldPath:"spec.containers{kubedns}"}): type: 'Normal' reason: 'Started' Started container with docker id d67587996649
...
I0921 01:17:39.660014 3702 server.go:611] Event(api.ObjectReference{Kind:"Pod", Namespace:"kube-system", Name:"kube-dns-v19-ihzyg", UID:"ff2c59c7-7f98-11e6-9525-42010af00002", APIVersion:"v1", ResourceVersion:"319", FieldPath:"spec.containers{dnsmasq}"}): type: 'Normal' reason: 'Started' Started container with docker id 0e55cc671e63
...
I0921 01:17:42.124746 3702 server.go:611] Event(api.ObjectReference{Kind:"Pod", Namespace:"kube-system", Name:"kube-dns-v19-ihzyg", UID:"ff2c59c7-7f98-11e6-9525-42010af00002", APIVersion:"v1", ResourceVersion:"319", FieldPath:"spec.containers{healthz}"}): type: 'Normal' reason: 'Started' Started container with docker id 9551d5e8bf63

I think this setup for readinessProbe should work properly.

thockin

LGTM

thockin · 2016-09-23T16:26:56Z

cluster/addons/dns/skydns-rc.yaml.base

          timeoutSeconds: 5
+          periodSeconds: 5


Once it becomes ready, is there anything that would make it un-ready? Maybe this doesn't need to be small

That's true. Readiness probe would not fail if kubedns don't fail. If kubedns fail, liveness probe will catch.

MrHohn · 2016-09-23T16:45:54Z

BTW, this PR may have conflict with #32406. Should wait for that merge and I will rebase.

k8s-ci-robot · 2016-09-25T20:04:05Z

Jenkins GKE smoke e2e failed for commit 55db762. Full PR test history.

The magic incantation to run this job again is @k8s-bot gke e2e test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

MrHohn · 2016-09-25T20:37:48Z

The new commit keeps periodSeconds as default.

thockin · 2016-09-26T19:22:37Z

@k8s-bot test this: github issue #IGNORE

k8s-ci-robot · 2016-09-26T19:55:40Z

Jenkins unit/integration failed for commit 55db762. Full PR test history.

The magic incantation to run this job again is @k8s-bot unit test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

thockin · 2016-09-26T20:06:44Z

@k8s-bot unit test this

k8s-github-robot · 2016-09-27T12:02:38Z

Automatic merge from submit-queue

k8s-cherrypick-bot · 2016-10-07T03:20:26Z

Removing label cherrypick-candidate because no release milestone was set. This is an invalid state and thus this PR is not being considered for cherry-pick to any release branch. Please add an appropriate release milestone and then re-add the label.

thockin · 2016-10-07T04:59:03Z

send a cherrypick for 1.4.1

On Thu, Oct 6, 2016 at 8:20 PM, k8s-cherrypick-bot <notifications@github.com

wrote:

Removing label cherrypick-candidate because no release milestone was set.
This is an invalid state and thus this PR is not being considered for
cherry-pick to any release branch. Please add an appropriate release
milestone and then re-add the label.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#33146 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AFVgVDwAfQfp35MkJTvUU3YTZBruO3rFks5qxbqXgaJpZM4KCS74
.

…32422-#32406-#33146-#33774-upstream-release-1.4 Automatic merge from submit-queue Automated cherry pick of #31894 #32422 #32406 #33146 #33774 Cherry pick of #31894 #32422 #32406 #33146 #33774 on release-1.4. #31894: Support graceful termination in kube-dns #32422: Added --log-facility flag to enhance dnsmasq logging #32406: Split dns healthcheck into two different urls #33146: Tune down initialDelaySeconds for readinessProbe #33774: Bump up addon kube-dns to v20 for graceful termination

…-of-#31894-kubernetes#32422-kubernetes#32406-kubernetes#33146-kubernetes#33774-upstream-release-1.4 Automatic merge from submit-queue Automated cherry pick of kubernetes#31894 kubernetes#32422 kubernetes#32406 kubernetes#33146 kubernetes#33774 Cherry pick of kubernetes#31894 kubernetes#32422 kubernetes#32406 kubernetes#33146 kubernetes#33774 on release-1.4. kubernetes#31894: Support graceful termination in kube-dns kubernetes#32422: Added --log-facility flag to enhance dnsmasq logging kubernetes#32406: Split dns healthcheck into two different urls kubernetes#33146: Tune down initialDelaySeconds for readinessProbe kubernetes#33774: Bump up addon kube-dns to v20 for graceful termination

googlebot added the cla: yes label Sep 21, 2016

k8s-github-robot assigned zmerlynn Sep 21, 2016

k8s-github-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. release-note-label-needed labels Sep 21, 2016

thockin approved these changes Sep 23, 2016

View reviewed changes

Tune down initialDelaySeconds for readinessProbe

55db762

MrHohn force-pushed the kubedns-readiness branch from 10f9642 to 55db762 Compare September 25, 2016 19:50

thockin added release-note-none Denotes a PR that doesn't merit a release note. lgtm "Looks good to me", indicates that a PR is ready to be merged. and removed release-note-label-needed labels Sep 26, 2016

k8s-github-robot merged commit 9e4ba18 into kubernetes:master Sep 27, 2016

k8s-github-robot mentioned this pull request Oct 6, 2016

[k8s.io] Downward API volume should update labels on modification [Conformance] {Kubernetes e2e suite} #34206

Closed

This was referenced Oct 7, 2016

Automated cherry pick of #32422 #33146 #32406 #33791 #33774 #34285

Closed

Automated cherry pick of #32422 #33146 #32406 #33774 #34286

Closed

Automated cherry pick of #31894 #32422 #32406 #33146 #33774 #34290

Merged

jessfraz added the cherrypick-candidate label Oct 7, 2016

k8s-cherrypick-bot removed the cherrypick-candidate label Oct 7, 2016

jessfraz added the release-note Denotes a PR that will be considered when it comes time to generate release notes. label Oct 7, 2016

jessfraz removed the release-note-none Denotes a PR that doesn't merit a release note. label Oct 7, 2016

MrHohn deleted the kubedns-readiness branch May 16, 2017 23:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tune down initialDelaySeconds for readinessProbe. #33146

Tune down initialDelaySeconds for readinessProbe. #33146

MrHohn commented Sep 21, 2016 •

edited by jessfraz

Loading

MrHohn commented Sep 21, 2016

MrHohn commented Sep 21, 2016

thockin left a comment

thockin Sep 23, 2016

MrHohn Sep 23, 2016 •

edited

Loading

MrHohn commented Sep 23, 2016 •

edited

Loading

k8s-ci-robot commented Sep 25, 2016

MrHohn commented Sep 25, 2016

thockin commented Sep 26, 2016

k8s-ci-robot commented Sep 26, 2016

thockin commented Sep 26, 2016

k8s-github-robot commented Sep 27, 2016

k8s-cherrypick-bot commented Oct 7, 2016

thockin commented Oct 7, 2016

Tune down initialDelaySeconds for readinessProbe. #33146

Tune down initialDelaySeconds for readinessProbe. #33146

Conversation

MrHohn commented Sep 21, 2016 • edited by jessfraz Loading

MrHohn commented Sep 21, 2016

MrHohn commented Sep 21, 2016

thockin left a comment

Choose a reason for hiding this comment

thockin Sep 23, 2016

Choose a reason for hiding this comment

MrHohn Sep 23, 2016 • edited Loading

Choose a reason for hiding this comment

MrHohn commented Sep 23, 2016 • edited Loading

k8s-ci-robot commented Sep 25, 2016

MrHohn commented Sep 25, 2016

thockin commented Sep 26, 2016

k8s-ci-robot commented Sep 26, 2016

thockin commented Sep 26, 2016

k8s-github-robot commented Sep 27, 2016

k8s-cherrypick-bot commented Oct 7, 2016

thockin commented Oct 7, 2016

MrHohn commented Sep 21, 2016 •

edited by jessfraz

Loading

MrHohn Sep 23, 2016 •

edited

Loading

MrHohn commented Sep 23, 2016 •

edited

Loading