-
Notifications
You must be signed in to change notification settings - Fork 40k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kubernetes-test-go: Build timed out #24285
Comments
@spxtr Can you take a look? I think the gke build has the same problem. |
Looks like they've gotten significantly slower since the last time this came up. I'll try and figure out why. |
This happened again and is continuing to block the merge queue. @spxtr can you make a bandaid that extends the timeout? |
Done, but I need to stress that we need to know why the duration is climbing. I'm afraid in a few months I'll get another issue that says "kubernetes-test-go is timing out". |
Thanks. Yes, we can leave this issue open until we figure it out. |
@spxtr maybe another bandaid for our build? http://kubekins.dls.corp.google.com/view/Critical%20Builds/job/kubernetes-build/9413/ |
Actually, it was taking 14-17 mins, so 30 minute timeout should hopefully be enough. hm. |
The build is the result of a PR going in that does affect the build. It had to download a new build image, and it also needs to build more tarballs. We might need to legitimately bump that one's timeout. |
It looks like it's about to time out again. |
Barely passed. The build got slower by a factor of two after #23931 went in. This was somewhat expected. I'll bump the timeout. |
We should exclude those platforms in our testing, maybe? We don't need ppc or arm binaries. Should we roll back #23931? |
I don't think we should roll it back. We want However, if the PR Jenkins e2e job starts timing out, then we should probably change it to do a quick-release instead of the full release. cc @luxas |
Dropped the priority now that the bleeding has stopped. I'll try to figure out why the test-go job got slower. |
@luxas this is a fairly significant increase in build and release times with #23931. Is there anything we can do to mitigate the increase? I see many packages downloaded during the build. Are we maybe downloading more than we need to? Can we cache anything somehow/somewhere? Can we parallelize package updating and/or building? |
The problem is not all the things that get downloaded into Steps we could take to decrease the time:
|
@spxtr We should do a full build on every CI run for all arches, so we may detect regressions in the code |
Whoops, I meant to make a PR to do that a while ago. cc @caesarxuchao |
@spxtr thanks for letting me know. Do you want me to send the PR or will you? |
I think we definitely want to build test targets for at least darwin, since plenty of people develop on a mac. It might be worth dropping tests for arm and windows.
sgtm |
Automatic merge from submit-queue Move cmd/linkcheck to test targets. #24285 (comment)
All of our kubernetes-build and kubernetes-test-go jobs are running on a single n1-highmem-32 instance, which is hitting 100% CPU usage fairly often. This is most likely why our test-go times are so inconsistent lately. |
Is there anything I may help with? (Not having access to your servers) |
@luxas I think I can handle the test-go problems. Feel free to work on your other suggestions, I think they're good. Thanks, though :) |
The timeout is too aggressive: http://kubekins.dls.corp.google.com/job/kubernetes-test-go/buildTimeTrend shows a passing run in 75 minutes (11026) and the average passing run time is in the mid 50m. We need at least a 100m timeout (2x average 50m runtime) rather than 80. Or to put it another way: in order to run this job reliably I want the timeout to be twice the average runtime, rather than a couple additional minutes. |
@fejta I'm wary of a 100 minute timeout. Last time this came up the job was taking ~35 minutes (#23127) so an 80 minute timeout was fine. Some time last week it started taking 60-80 minutes and occasionally timing out. I'd like to know why. In the meantime, to fix the submit queue lets bump the timeout. I'm also starting to think we should run verify-*.sh, unit/integration tests, and test-cmd.sh in separate Jenkins jobs. kubernetes-test-go blocks the queue for over an hour now. |
Automatic merge from submit-queue Bump kubernetes-test-go timeout. It looks like the run times got more inconsistent because of load on the VM. Adding another Jenkins slave improved things so we're not constantly timing out, but it still gets a little close to timing out at times. Average runtime is ~45 mins so I went with a 100 min timeout. Fixes #24285
Remove patch for sa public key configuration Origin-commit: 7631cfb6243de670c962eeb677f47bb3338c5924
http://kubekins.dls.corp.google.com/view/Critical%20Builds/job/kubernetes-test-go/10930/
https://console.cloud.google.com/storage/kubernetes-jenkins/logs/kubernetes-test-go/10930/
Happened twice in a row.
The text was updated successfully, but these errors were encountered: