Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tune down initialDelaySeconds for readinessProbe. #33146

Merged
merged 1 commit into from
Sep 27, 2016

Conversation

MrHohn
Copy link
Member

@MrHohn MrHohn commented Sep 21, 2016

Fixed #33053.

Tuned down the initialDelaySeconds(original 30s) for readiness probe to 3 seconds and periodSeconds(default 10s) to 5 seconds to shorten the initial time before a dns server pod being exposed. This configuration passed DNS e2e tests and did not even hit any readiness failure(for kube-dns) with a GCE cluster with 4 nodes during the experiments.

For scaling out kube-dns servers, it took less than 10s for servers being exposed after they appeared as running, which is much faster than 30+s(the original cost).

failureThreshold is left as default(3) and it would not lead to restart because the status of readiness probe would only affect whether endpoints being exposed in service or not(in the dns service point of view). According to the implementation of prober, the number of retries for readiness probe is unbounded. Hence there is no obvious effect if the readiness probe fail several times in the beginning.

The state machine of prober could be illustrated with below figure:

drawing

I want to see the e2e result of this PR for further evaluation.

Tune down initialDelaySeconds for readinessProbe.

@thockin @bprashanth


This change is Reviewable

@k8s-github-robot k8s-github-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. release-note-label-needed labels Sep 21, 2016
@MrHohn
Copy link
Member Author

MrHohn commented Sep 21, 2016

@k8s-bot kubemark test this issue: #32440

@MrHohn
Copy link
Member Author

MrHohn commented Sep 21, 2016

It seems like the Readiness probe succeeded at the first attempt.
I0921 01:17:45.553582 3702 prober.go:115] Readiness probe for "kube-dns-v19-ihzyg_kube-system(ff2c59c7-7f98-11e6-9525-42010af00002):kubedns" succeeded

And endpoints being detected and exposed in around 40ms after that.
I0921 01:17:45.591768 7 proxier.go:573] Setting endpoints for "kube-system/kube-dns:dns" to [10.180.3.5:53]
I0921 01:17:45.591824 7 proxier.go:573] Setting endpoints for "kube-system/kube-dns:dns-tcp" to [10.180.3.5:53]

The total titme is ~25s since kubelete received this pod.
I0921 01:17:20.543855 3702 config.go:397] Receiving a new pod "kube-dns-v19-ihzyg_kube-system(ff2c59c7-7f98-11e6-9525-42010af00002)"

And it took ~22s to start all three container(initialDelaySeconds start after this).
I0921 01:17:36.493080 3702 server.go:611] Event(api.ObjectReference{Kind:"Pod", Namespace:"kube-system", Name:"kube-dns-v19-ihzyg", UID:"ff2c59c7-7f98-11e6-9525-42010af00002", APIVersion:"v1", ResourceVersion:"319", FieldPath:"spec.containers{kubedns}"}): type: 'Normal' reason: 'Started' Started container with docker id d67587996649
...
I0921 01:17:39.660014 3702 server.go:611] Event(api.ObjectReference{Kind:"Pod", Namespace:"kube-system", Name:"kube-dns-v19-ihzyg", UID:"ff2c59c7-7f98-11e6-9525-42010af00002", APIVersion:"v1", ResourceVersion:"319", FieldPath:"spec.containers{dnsmasq}"}): type: 'Normal' reason: 'Started' Started container with docker id 0e55cc671e63
...
I0921 01:17:42.124746 3702 server.go:611] Event(api.ObjectReference{Kind:"Pod", Namespace:"kube-system", Name:"kube-dns-v19-ihzyg", UID:"ff2c59c7-7f98-11e6-9525-42010af00002", APIVersion:"v1", ResourceVersion:"319", FieldPath:"spec.containers{healthz}"}): type: 'Normal' reason: 'Started' Started container with docker id 9551d5e8bf63

I think this setup for readinessProbe should work properly.

Copy link
Member

@thockin thockin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

timeoutSeconds: 5
periodSeconds: 5
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once it becomes ready, is there anything that would make it un-ready? Maybe this doesn't need to be small

Copy link
Member Author

@MrHohn MrHohn Sep 23, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's true. Readiness probe would not fail if kubedns don't fail. If kubedns fail, liveness probe will catch.

@MrHohn
Copy link
Member Author

MrHohn commented Sep 23, 2016

BTW, this PR may have conflict with #32406. Should wait for that merge and I will rebase.

@k8s-ci-robot
Copy link
Contributor

Jenkins GKE smoke e2e failed for commit 55db762. Full PR test history.

The magic incantation to run this job again is @k8s-bot gke e2e test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

@MrHohn
Copy link
Member Author

MrHohn commented Sep 25, 2016

The new commit keeps periodSeconds as default.

@thockin thockin added release-note-none Denotes a PR that doesn't merit a release note. lgtm "Looks good to me", indicates that a PR is ready to be merged. and removed release-note-label-needed labels Sep 26, 2016
@thockin
Copy link
Member

thockin commented Sep 26, 2016

@k8s-bot test this: github issue #IGNORE

@k8s-ci-robot
Copy link
Contributor

Jenkins unit/integration failed for commit 55db762. Full PR test history.

The magic incantation to run this job again is @k8s-bot unit test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

@thockin
Copy link
Member

thockin commented Sep 26, 2016

@k8s-bot unit test this

@k8s-github-robot
Copy link

Automatic merge from submit-queue

@k8s-cherrypick-bot
Copy link

Removing label cherrypick-candidate because no release milestone was set. This is an invalid state and thus this PR is not being considered for cherry-pick to any release branch. Please add an appropriate release milestone and then re-add the label.

@thockin
Copy link
Member

thockin commented Oct 7, 2016

send a cherrypick for 1.4.1

On Thu, Oct 6, 2016 at 8:20 PM, k8s-cherrypick-bot <notifications@github.com

wrote:

Removing label cherrypick-candidate because no release milestone was set.
This is an invalid state and thus this PR is not being considered for
cherry-pick to any release branch. Please add an appropriate release
milestone and then re-add the label.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#33146 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AFVgVDwAfQfp35MkJTvUU3YTZBruO3rFks5qxbqXgaJpZM4KCS74
.

k8s-github-robot pushed a commit that referenced this pull request Oct 7, 2016
…32422-#32406-#33146-#33774-upstream-release-1.4

Automatic merge from submit-queue

Automated cherry pick of #31894 #32422 #32406 #33146 #33774

Cherry pick of #31894 #32422 #32406 #33146 #33774 on release-1.4.

#31894: Support graceful termination in kube-dns
#32422: Added --log-facility flag to enhance dnsmasq logging
#32406: Split dns healthcheck into two different urls
#33146: Tune down initialDelaySeconds for readinessProbe
#33774: Bump up addon kube-dns to v20 for graceful termination
@jessfraz jessfraz added the release-note Denotes a PR that will be considered when it comes time to generate release notes. label Oct 7, 2016
@jessfraz jessfraz removed the release-note-none Denotes a PR that doesn't merit a release note. label Oct 7, 2016
shyamjvs pushed a commit to shyamjvs/kubernetes that referenced this pull request Dec 1, 2016
…-of-#31894-kubernetes#32422-kubernetes#32406-kubernetes#33146-kubernetes#33774-upstream-release-1.4

Automatic merge from submit-queue

Automated cherry pick of kubernetes#31894 kubernetes#32422 kubernetes#32406 kubernetes#33146 kubernetes#33774

Cherry pick of kubernetes#31894 kubernetes#32422 kubernetes#32406 kubernetes#33146 kubernetes#33774 on release-1.4.

kubernetes#31894: Support graceful termination in kube-dns
kubernetes#32422: Added --log-facility flag to enhance dnsmasq logging
kubernetes#32406: Split dns healthcheck into two different urls
kubernetes#33146: Tune down initialDelaySeconds for readinessProbe
kubernetes#33774: Bump up addon kube-dns to v20 for graceful termination
@MrHohn MrHohn deleted the kubedns-readiness branch May 16, 2017 23:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants