kubernetes-test-go: Build timed out #24285

lavalamp · 2016-04-14T21:46:26Z

http://kubekins.dls.corp.google.com/view/Critical%20Builds/job/kubernetes-test-go/10930/

https://console.cloud.google.com/storage/kubernetes-jenkins/logs/kubernetes-test-go/10930/

Happened twice in a row.

lavalamp · 2016-04-14T21:46:56Z

@spxtr Can you take a look? I think the gke build has the same problem.

spxtr · 2016-04-14T21:55:40Z

Looks like they've gotten significantly slower since the last time this came up. I'll try and figure out why.

lavalamp · 2016-04-14T22:27:52Z

This happened again and is continuing to block the merge queue. @spxtr can you make a bandaid that extends the timeout?

spxtr · 2016-04-14T22:50:12Z

Done, but I need to stress that we need to know why the duration is climbing. I'm afraid in a few months I'll get another issue that says "kubernetes-test-go is timing out".

lavalamp · 2016-04-14T22:52:31Z

Thanks. Yes, we can leave this issue open until we figure it out.

lavalamp · 2016-04-14T22:56:20Z

@spxtr maybe another bandaid for our build? http://kubekins.dls.corp.google.com/view/Critical%20Builds/job/kubernetes-build/9413/

lavalamp · 2016-04-14T22:56:59Z

Actually, it was taking 14-17 mins, so 30 minute timeout should hopefully be enough. hm.

spxtr · 2016-04-14T22:59:44Z

The build is the result of a PR going in that does affect the build. It had to download a new build image, and it also needs to build more tarballs. We might need to legitimately bump that one's timeout.

lavalamp · 2016-04-14T23:08:16Z

It looks like it's about to time out again.

spxtr · 2016-04-14T23:13:30Z

Barely passed. The build got slower by a factor of two after #23931 went in. This was somewhat expected. I'll bump the timeout.

lavalamp · 2016-04-14T23:25:09Z

We should exclude those platforms in our testing, maybe? We don't need ppc or arm binaries. Should we roll back #23931?

spxtr · 2016-04-14T23:32:33Z

I don't think we should roll it back. We want kubernetes-build to do a full release build, which now includes building for those architectures.

However, if the PR Jenkins e2e job starts timing out, then we should probably change it to do a quick-release instead of the full release.

cc @luxas

spxtr · 2016-04-14T23:34:31Z

Dropped the priority now that the bleeding has stopped. I'll try to figure out why the test-go job got slower.

david-mcmahon · 2016-04-15T01:36:33Z

@luxas this is a fairly significant increase in build and release times with #23931. Is there anything we can do to mitigate the increase? I see many packages downloaded during the build. Are we maybe downloading more than we need to? Can we cache anything somehow/somewhere? Can we parallelize package updating and/or building?

luxas · 2016-04-15T06:40:11Z

The problem is not all the things that get downloaded into kube-cross, that's just one time.
It's the building time that's a small problem. Since go1.5+ building may get ~2x slower, and now we've both upgraded to go1.6 and added more server platforms, so the increase in build time is expected. Sorry for not notifying you in beforehand though.

Steps we could take to decrease the time:

Move cmd/linkcheck to test targets. I don't know why that one is considered a server target.
Build test targets only for linux/amd64 (now linux/amd64, windows/amd64, darwin/amd64, linux/arm)
Remove addon images: don't ship kube-registry-proxy and pause images in tars. #23605
Consider dropping support for */386 for kubectl
Remove cmd/kubemark for arm, arm64 and ppc64le. I don't think it's required from official builds.

luxas · 2016-04-15T06:46:42Z

@spxtr We should do a full build on every CI run for all arches, so we may detect regressions in the code

spxtr · 2016-04-15T06:48:42Z

Move cmd/linkcheck to test targets. I don't know why that one is considered a server target.

Whoops, I meant to make a PR to do that a while ago. cc @caesarxuchao

caesarxuchao · 2016-04-15T06:58:35Z

@spxtr thanks for letting me know. Do you want me to send the PR or will you?

spxtr · 2016-04-15T17:00:02Z

Build test targets only for linux/amd64 (now linux/amd64, windows/amd64, darwin/amd64, linux/arm)

I think we definitely want to build test targets for at least darwin, since plenty of people develop on a mac. It might be worth dropping tests for arm and windows.

Remove cmd/kubemark for arm, arm64 and ppc64le. I don't think it's required from official builds.

sgtm

Automatic merge from submit-queue Move cmd/linkcheck to test targets. #24285 (comment)

spxtr · 2016-04-18T17:41:20Z

All of our kubernetes-build and kubernetes-test-go jobs are running on a single n1-highmem-32 instance, which is hitting 100% CPU usage fairly often. This is most likely why our test-go times are so inconsistent lately.

luxas · 2016-04-18T17:44:14Z

Is there anything I may help with? (Not having access to your servers)

spxtr · 2016-04-18T17:53:59Z

@luxas I think I can handle the test-go problems. Feel free to work on your other suggestions, I think they're good. Thanks, though :)

fejta · 2016-04-19T06:44:43Z

The timeout is too aggressive:

http://kubekins.dls.corp.google.com/job/kubernetes-test-go/buildTimeTrend shows a passing run in 75 minutes (11026) and the average passing run time is in the mid 50m. We need at least a 100m timeout (2x average 50m runtime) rather than 80.

Or to put it another way: in order to run this job reliably I want the timeout to be twice the average runtime, rather than a couple additional minutes.

spxtr · 2016-04-19T07:20:45Z

@fejta I'm wary of a 100 minute timeout. Last time this came up the job was taking ~35 minutes (#23127) so an 80 minute timeout was fine. Some time last week it started taking 60-80 minutes and occasionally timing out. I'd like to know why. In the meantime, to fix the submit queue lets bump the timeout.

I'm also starting to think we should run verify-*.sh, unit/integration tests, and test-cmd.sh in separate Jenkins jobs. kubernetes-test-go blocks the queue for over an hour now.

Automatic merge from submit-queue Bump kubernetes-test-go timeout. It looks like the run times got more inconsistent because of load on the VM. Adding another Jenkins slave improved things so we're not constantly timing out, but it still gets a little close to timing out at times. Average runtime is ~45 mins so I went with a 100 min timeout. Fixes #24285

Remove patch for sa public key configuration Origin-commit: 7631cfb6243de670c962eeb677f47bb3338c5924

lavalamp assigned spxtr Apr 14, 2016

lavalamp added priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. area/test-infra kind/flake Categorizes issue or PR as related to a flaky test. labels Apr 14, 2016

spxtr mentioned this issue Apr 14, 2016

Bump the kubernetes-test-go timeout. #24289

Merged

spxtr added priority/backlog Higher priority than priority/awaiting-more-evidence. and removed priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. labels Apr 14, 2016

spxtr removed the kind/flake Categorizes issue or PR as related to a flaky test. label Apr 14, 2016

spxtr mentioned this issue Apr 15, 2016

Move cmd/linkcheck to test targets. #24338

Merged

k8s-github-robot pushed a commit that referenced this issue Apr 16, 2016

Merge pull request #24338 from spxtr/linkcheck

8a44115

Automatic merge from submit-queue Move cmd/linkcheck to test targets. #24285 (comment)

spxtr added the priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. label Apr 16, 2016

spxtr added the kind/flake Categorizes issue or PR as related to a flaky test. label Apr 16, 2016

janetkuo mentioned this issue Apr 19, 2016

./hack/build-go.sh can take 35 minutes on jenkins #24424

Closed

fejta removed the priority/backlog Higher priority than priority/awaiting-more-evidence. label Apr 19, 2016

spxtr mentioned this issue Apr 19, 2016

Bump kubernetes-test-go timeout. #24480

Merged

k8s-github-robot closed this as completed in #24480 Apr 19, 2016

ixdy mentioned this issue May 27, 2016

Unit tests take more than an hour to run #25940

Closed

openshift-publish-robot pushed a commit to openshift/kubernetes that referenced this issue Jan 2, 2020

Merge pull request kubernetes#24285 from marun/remove-sa-cert-patch

db66783

Remove patch for sa public key configuration Origin-commit: 7631cfb6243de670c962eeb677f47bb3338c5924

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kubernetes-test-go: Build timed out #24285

kubernetes-test-go: Build timed out #24285

lavalamp commented Apr 14, 2016

lavalamp commented Apr 14, 2016

spxtr commented Apr 14, 2016

lavalamp commented Apr 14, 2016

spxtr commented Apr 14, 2016

lavalamp commented Apr 14, 2016

lavalamp commented Apr 14, 2016

lavalamp commented Apr 14, 2016

spxtr commented Apr 14, 2016 •

edited

Loading

lavalamp commented Apr 14, 2016

spxtr commented Apr 14, 2016

lavalamp commented Apr 14, 2016

spxtr commented Apr 14, 2016 •

edited

Loading

spxtr commented Apr 14, 2016

david-mcmahon commented Apr 15, 2016

luxas commented Apr 15, 2016

luxas commented Apr 15, 2016

spxtr commented Apr 15, 2016

caesarxuchao commented Apr 15, 2016

spxtr commented Apr 15, 2016 •

edited

Loading

spxtr commented Apr 18, 2016

luxas commented Apr 18, 2016

spxtr commented Apr 18, 2016 •

edited

Loading

fejta commented Apr 19, 2016

spxtr commented Apr 19, 2016

kubernetes-test-go: Build timed out #24285

kubernetes-test-go: Build timed out #24285

Comments

lavalamp commented Apr 14, 2016

lavalamp commented Apr 14, 2016

spxtr commented Apr 14, 2016

lavalamp commented Apr 14, 2016

spxtr commented Apr 14, 2016

lavalamp commented Apr 14, 2016

lavalamp commented Apr 14, 2016

lavalamp commented Apr 14, 2016

spxtr commented Apr 14, 2016 • edited Loading

lavalamp commented Apr 14, 2016

spxtr commented Apr 14, 2016

lavalamp commented Apr 14, 2016

spxtr commented Apr 14, 2016 • edited Loading

spxtr commented Apr 14, 2016

david-mcmahon commented Apr 15, 2016

luxas commented Apr 15, 2016

luxas commented Apr 15, 2016

spxtr commented Apr 15, 2016

caesarxuchao commented Apr 15, 2016

spxtr commented Apr 15, 2016 • edited Loading

spxtr commented Apr 18, 2016

luxas commented Apr 18, 2016

spxtr commented Apr 18, 2016 • edited Loading

fejta commented Apr 19, 2016

spxtr commented Apr 19, 2016

spxtr commented Apr 14, 2016 •

edited

Loading

spxtr commented Apr 14, 2016 •

edited

Loading

spxtr commented Apr 15, 2016 •

edited

Loading

spxtr commented Apr 18, 2016 •

edited

Loading