-
Notifications
You must be signed in to change notification settings - Fork 40k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
e2e: increase the container probing test timeout #18220
Conversation
This addresses #17656 partially. |
cc @kubernetes/goog-testing, tests that require pulling images in a parallel environment may need a longer timeout. |
Labelling this PR as size/XS |
@@ -75,7 +75,7 @@ var _ = Describe("Probing container", func() { | |||
p, err := podClient.Create(makePodSpec(probe.withFailing().build(), nil)) | |||
expectNoError(err) | |||
|
|||
err = wait.Poll(poll, 90*time.Second, func() (bool, error) { | |||
err = wait.Poll(poll, 120*time.Second, func() (bool, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this test needs to run the entire poll time to pass, maybe it would be better to add a preliminary phase that waits for the pod to be Running.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. The test doesn't correctly test the initial delay at all given that it also includes the pod startup time.
GCE e2e test build/test passed for commit 2cc445c85e0b3895926c84d7354cb4f005971418. |
2cc445c
to
3191191
Compare
Labelling this PR as size/M |
GCE e2e test build/test passed for commit 3191191ac26e08ba067b7019add35e0335d6a732. |
return false, nil | ||
} | ||
return true, nil | ||
})).NotTo(HaveOccurred(), "pod never became running") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extract this to a helper function?
3191191
to
c0643d3
Compare
Labelling this PR as size/S |
GCE e2e test build/test passed for commit c0643d31d001af9cfd78dc4d2c8f1cbb10f234fd. |
3695522
to
c03d664
Compare
Labelling this PR as size/M |
This change compare the timestamps in pod status to accurately determine whether the initial delay has passed when the pod (container) becomes ready. Also increase the test timeout to account for the parallel testing environment.
c03d664
to
0ae1d54
Compare
@timstclair, test passed locally. PR ready for review. Note that it'd take 1 minute after the container started to report the ready pod status due to #18240. |
GCE e2e test build/test passed for commit c03d6645ec89908f88300984d6249f6d76106ab9. |
LGTM, thanks for fixing the test! |
@k8s-bot unit test this |
GCE e2e test build/test passed for commit 0ae1d54. |
Here is a brief summary of the test: We assume the timeout is large enough for the pod to become running and ready. If this is not true, the test will fail, and we can easily see the pod state and readiness from the periodic logging. This timeout could be caused by 1) kubelet slow to start the pod; 2) kubelet slow to send out the status (due to QPS limit or other factors; 3) kubelet fails to probe the container in time. Only the third one is probing-related, but either way the test should fail. If a pod becomes ready within the timeout, we check the timestamps in the pod status to make sure that the pod becomes ready at least 30s (initial delay) after the container starts. We could also add a new condition to fail the test if the latency between container starting and pod becoming ready is too long. |
@k8s-bot test this Tests are more than 48 hours old. Re-running tests. |
GCE e2e test build/test passed for commit 0ae1d54. |
I'm fine with that - although I would make timeouts even bigger. |
Good timing, gce-parallel just went red/yellow. Does this PR have to be merged with #18240 or can they go in separately? I'd like to try putting this one in first and see if Jenkins goes green. |
Well, it's still early in the day. Merging #18240 and this to help test flakiness. |
e2e: increase the container probing test timeout
This test runs in the parallel suite and should allow a longer timeout.