Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

e2e: increase the container probing test timeout #18220

Merged
merged 1 commit into from
Dec 7, 2015

Conversation

yujuhong
Copy link
Contributor

@yujuhong yujuhong commented Dec 4, 2015

This test runs in the parallel suite and should allow a longer timeout.

@yujuhong
Copy link
Contributor Author

yujuhong commented Dec 4, 2015

This addresses #17656 partially.

@wojtek-t @timstclair

@yujuhong
Copy link
Contributor Author

yujuhong commented Dec 4, 2015

cc @kubernetes/goog-testing, tests that require pulling images in a parallel environment may need a longer timeout.

@k8s-github-robot
Copy link

Labelling this PR as size/XS

@k8s-github-robot k8s-github-robot added the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label Dec 4, 2015
@@ -75,7 +75,7 @@ var _ = Describe("Probing container", func() {
p, err := podClient.Create(makePodSpec(probe.withFailing().build(), nil))
expectNoError(err)

err = wait.Poll(poll, 90*time.Second, func() (bool, error) {
err = wait.Poll(poll, 120*time.Second, func() (bool, error) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this test needs to run the entire poll time to pass, maybe it would be better to add a preliminary phase that waits for the pod to be Running.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. The test doesn't correctly test the initial delay at all given that it also includes the pod startup time.

@k8s-bot
Copy link

k8s-bot commented Dec 4, 2015

GCE e2e test build/test passed for commit 2cc445c85e0b3895926c84d7354cb4f005971418.

@k8s-github-robot
Copy link

Labelling this PR as size/M

@k8s-github-robot k8s-github-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Dec 4, 2015
@k8s-bot
Copy link

k8s-bot commented Dec 4, 2015

GCE e2e test build/test passed for commit 3191191ac26e08ba067b7019add35e0335d6a732.

@ixdy ixdy assigned timstclair and unassigned ixdy Dec 4, 2015
return false, nil
}
return true, nil
})).NotTo(HaveOccurred(), "pod never became running")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extract this to a helper function?

@k8s-github-robot k8s-github-robot removed the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Dec 4, 2015
@k8s-github-robot
Copy link

Labelling this PR as size/S

@k8s-github-robot k8s-github-robot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Dec 4, 2015
@k8s-bot
Copy link

k8s-bot commented Dec 4, 2015

GCE e2e test build/test passed for commit c0643d31d001af9cfd78dc4d2c8f1cbb10f234fd.

@yujuhong yujuhong force-pushed the increase_timeout branch 3 times, most recently from 3695522 to c03d664 Compare December 5, 2015 00:46
@k8s-github-robot k8s-github-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Dec 5, 2015
@k8s-github-robot
Copy link

Labelling this PR as size/M

This change compare the timestamps in pod status to accurately determine
whether the initial delay has passed when the pod (container) becomes ready.
Also increase the test timeout to account for the parallel testing environment.
@yujuhong
Copy link
Contributor Author

yujuhong commented Dec 5, 2015

@timstclair, test passed locally. PR ready for review. Note that it'd take 1 minute after the container started to report the ready pod status due to #18240.

@k8s-bot
Copy link

k8s-bot commented Dec 5, 2015

GCE e2e test build/test passed for commit c03d6645ec89908f88300984d6249f6d76106ab9.

@timstclair
Copy link

LGTM, thanks for fixing the test!

@timstclair timstclair added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Dec 5, 2015
@yujuhong
Copy link
Contributor Author

yujuhong commented Dec 5, 2015

@k8s-bot unit test this

@k8s-bot
Copy link

k8s-bot commented Dec 5, 2015

GCE e2e test build/test passed for commit 0ae1d54.

@yujuhong
Copy link
Contributor Author

yujuhong commented Dec 5, 2015

Here is a brief summary of the test:

We assume the timeout is large enough for the pod to become running and ready. If this is not true, the test will fail, and we can easily see the pod state and readiness from the periodic logging. This timeout could be caused by 1) kubelet slow to start the pod; 2) kubelet slow to send out the status (due to QPS limit or other factors; 3) kubelet fails to probe the container in time. Only the third one is probing-related, but either way the test should fail.

If a pod becomes ready within the timeout, we check the timestamps in the pod status to make sure that the pod becomes ready at least 30s (initial delay) after the container starts.

We could also add a new condition to fail the test if the latency between container starting and pod becoming ready is too long.

@k8s-github-robot
Copy link

@k8s-bot test this

Tests are more than 48 hours old. Re-running tests.

@k8s-bot
Copy link

k8s-bot commented Dec 7, 2015

GCE e2e test build/test passed for commit 0ae1d54.

@wojtek-t
Copy link
Member

wojtek-t commented Dec 7, 2015

I'm fine with that - although I would make timeouts even bigger.

@yujuhong
Copy link
Contributor Author

yujuhong commented Dec 7, 2015

@fabioy (on-call), can we manually merge this PR and #18240? These two should fix the flaky container test. Thanks!

@fabioy
Copy link
Contributor

fabioy commented Dec 7, 2015

Good timing, gce-parallel just went red/yellow. Does this PR have to be merged with #18240 or can they go in separately? I'd like to try putting this one in first and see if Jenkins goes green.

@yujuhong
Copy link
Contributor Author

yujuhong commented Dec 7, 2015

@fabioy, they can go in separately but #18240 should be merged before this one.

@fabioy
Copy link
Contributor

fabioy commented Dec 7, 2015

Well, it's still early in the day. Merging #18240 and this to help test flakiness.

fabioy added a commit that referenced this pull request Dec 7, 2015
e2e: increase the container probing test timeout
@fabioy fabioy merged commit b7dc117 into kubernetes:master Dec 7, 2015
@yujuhong yujuhong deleted the increase_timeout branch December 8, 2015 17:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/test lgtm "Looks good to me", indicates that a PR is ready to be merged. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants