-
Notifications
You must be signed in to change notification settings - Fork 40k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
in e2e test, when kubectl exec fails to find the container to run a command, it should retry #26100
Conversation
LGTM Thanks @rootfs |
|
@@ -477,6 +477,13 @@ func kubectlExecWithRetry(namespace string, podName, containerName string, args | |||
continue | |||
} | |||
} | |||
if err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's already a check for a non-nil error right before this. I would move the strings.Contains check you're adding into that block instead of adding another one, assuming we decide to keep this.
From @ncdc on
This indicates that the pod is being torn down right after This fix will mask the issue by retrying until the pod with the same name is recreated. But it does indicate that #26076 destabilized PD mount/unmount code. To unblock the submit queue, I'll LGTM this. I am working on kubelet volume manager that will fix the actual races, and along with that change, I will revert this change and verify the races are fixed and modify the tests further as needed. |
@k8s-oncall Could you please manually merge this |
if strings.Contains(strings.ToLower(string(stdErrBytes)), "container not found") { | ||
// Retry on "container not found" errors | ||
Logf("Warning: kubectl exec encountered container not found.\nerr=%v\nstdout=%v\nstderr=%v)", err, string(stdOutBytes), string(stdErrBytes)) | ||
continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should there be a sleep before trying again?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, there should
…ould retry Signed-off-by: Huamin Chen <hchen@redhat.com>
@saad-ali feedback addressed. Have seen pd test running 10 times for 3 hours without any failure |
if strings.Contains(strings.ToLower(string(stdErrBytes)), "container not found") { | ||
// Retry on "container not found" errors | ||
Logf("Warning: kubectl exec encountered container not found.\nerr=%v\nstdout=%v\nstderr=%v)", err, string(stdOutBytes), string(stdErrBytes)) | ||
time.Sleep(2 * time.Second) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sleep should happen for both "i/o timeout" and "container not found", no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't seen any i/o timeout yet. If it happens, I am not sure timeout helps.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack
LGTM |
@k8s-bot test this [submit-queue is verifying that this PR is safe to merge] |
GCE e2e build/test passed for commit 56719f8. |
Automatic merge from submit-queue |
fix #26076
Without retrying upon "container not found" error,
Pod Disks
test failed on the following error:Now I've run this fix on e2e pd test 5 times and no longer see any failure