-
Notifications
You must be signed in to change notification settings - Fork 40k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
e2e flake: Kubectl client Simple pod [It] should support exec through an HTTP proxy #19997
Comments
I can look at this. |
This sounds very similar to one I recently added logs to. Looking... |
My logs were added a few hours before @erictune made this bug. The jenkins link doesn't work anymore, though-- will dig and see if I can find the actual logs. (@erictune-- I would have seen this earlier if you'd added the kind/flake label!) |
@erictune do you recall what PR this flaked on? |
Looking at test history this doesn't seem to be particularly flaky. I could find only one failure in the last 100 or so runs, and that appeared to be coincidental. I'm running it in a loop now to see if I can get a repro, but no failures so far. |
@jlowdermilk That's why I added logs, and why I'd like to see the log messages I added in @erictune's reported flake. |
Could not repro the flake in ~300 runs. Unless it shows up on pr-builder we can probably close this. |
I think this test may be hanging our e2e runs occasionally - see #13485 (comment). I'm in well over my level of understanding, but my best guess is that the cc @ashcrow for good measure. |
GCS permalinks for two failures I suspect recently: https://storage.cloud.google.com/kubernetes-jenkins/logs/kubernetes-e2e-gce/10591 and https://storage.cloud.google.com/kubernetes-jenkins/pr-logs/pull/19503/kubernetes-pull-build-test-e2e-gce/25721 |
Thanks for the links @ixdy. I'm still catching up on new issues and test flakes (buildcop), but I will look at the above. |
After looking at above logs digging some more, it looks like this flake has the same root cause as #19466, namely |
hit again. |
#20444 has merged. Hopefully that fixes this. |
Logs from above run: kubernetes-pull-build-test-e2e-gce/27742. Failed test log:
|
The root cause of the new failure is different. The new failure is Before #20444 there was a bug in spdystream (sadly introduced by me 😦) where if a goaway frame was received, instead of breaking out of the loop that waits for new incoming frames, the connection just sat there waiting for new frames forever. With spdystream fixed and the Godep updated in Kubernetes, I started seeing these
If the server sends the goaway frame before it sends the stream reply frame, you'll get the In #20444 I added code to ensure that the stream reply frame is sent before the goaway frame. I will have to do some more digging to see why this is happening now. |
I looked at the big log above and extracted this info from the goproxy pod's log:
The apiserver also shows this:
This same "connection reset" error shows up several times in the log file, but I also see it in clean e2e runs, so it may be a red herring, or not... |
This flake is currently blocking merges |
@mikedanese my availability is limited for the next few days (personal reasons). I hesitate to suggest this, but you could temporarily skip this test until I have more time to research it. |
@mikedanese the link you just posted is a different root cause:
I think this issue has been repurposed a few times.... Should we create individual issues for each root cause? |
Is it possible the testbed was under so much load that the TLS handshake from kubectl (running in the netexec pod) through the goproxy pod to the apiserver didn't complete within the 10 second handshake timeout? |
Hit it again |
Found new conclusive evidence of this test occasionally hanging. Forked off to #22671. |
|
closing in favor of #22671 |
Hit this twice since yesterday afternoon.
Example:
http://kubekins.dls.corp.google.com:8081/job/kubernetes-pull-build-test-e2e-gce/24671/#
Previous closed instances of this flake:
#19500 #17523 #15787 #15713
The text was updated successfully, but these errors were encountered: