Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jenkins is regularly getting stuck never completing #22435

Closed
eparis opened this issue Mar 3, 2016 · 21 comments
Closed

Jenkins is regularly getting stuck never completing #22435

eparis opened this issue Mar 3, 2016 · 21 comments
Assignees
Labels
area/test-infra priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now.
Milestone

Comments

@eparis
Copy link
Contributor

eparis commented Mar 3, 2016

... so PRs hang 'pending' and will never get into the submit queue or will get kicked out after they wait and hour for jenkins to return.

@eparis eparis added the priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. label Mar 3, 2016
@eparis eparis added this to the v1.2 milestone Mar 3, 2016
@eparis
Copy link
Contributor Author

eparis commented Mar 3, 2016

@bgrant0607
Copy link
Member

cc @fejta

@eparis
Copy link
Contributor Author

eparis commented Mar 3, 2016

I'm compiling a list of all PRs that look to be stuck on Jenkins....

@bgrant0607
Copy link
Member

cc @wojtek-t

@eparis
Copy link
Contributor Author

eparis commented Mar 3, 2016

All PRs with ${BUILD_NUMBER} in the jenkins status url. Which have been pending for > 30 minutes. This seems to be a good indicator of everything stuck....

#17048
#17305
#17688
#17893
#18016
#18134
#18464
#18824
#19313
#19424
#19599
#19682
#19706
#19827
#19872
#19905
#19955
#20110
#20120
#21263
#21308
#21617
#21741
#22206
#22231
#22253
#22394
#22418
#22419
#22429
#22430

@justinsb
Copy link
Member

justinsb commented Mar 3, 2016

#21907 also. That one needed a squash anyway, so I squashed, repushed and re-added LGTM. Will be interesting to see what happens...

@eparis
Copy link
Contributor Author

eparis commented Mar 3, 2016

looking at the jenkins 'status' of #21907 it looks like it is going to be ok. The busted ones are linking something like:

http://pr-test.k8s.io/$%7BghprbPullId%7D/kubernetes-pull-build-test-e2e-gce/$%7BBUILD_NUMBER%7D/

And a working PR looks like:

http://pr-test.k8s.io/21907/kubernetes-pull-build-test-e2e-gce/31499/

So none of that $%7BghprbPullId%7D or $%7BBUILD_NUMBER%7D crap in the status link...

@gmarek
Copy link
Contributor

gmarek commented Mar 3, 2016

7B and 7D are '{' and '}' - looks like a change from " " to ' ' in some bash script...

@eparis
Copy link
Contributor Author

eparis commented Mar 3, 2016

@gmarek but only 'sometimes' :-( which makes me think it might have to do with which builder ? maybe?

@gmarek
Copy link
Contributor

gmarek commented Mar 3, 2016

If it happens only for some builds (I thought that we have problems with all of them), then it's certainly one of the builders. @ixdy is probably the best person to ask.

@eparis
Copy link
Contributor Author

eparis commented Mar 3, 2016

Its a LOT of them, but not all :-(

@ixdy
Copy link
Member

ixdy commented Mar 3, 2016

Looking. I'm not really sure what's happening. I upgraded the Github PR plugin last night, to see if it fixed any bugs that may have prevented those few PRs from scheduling, but it seems to have made things worse. I'm currently suspicious of jenkinsci/ghprb-plugin/pull/273, though still looking through recent commits.

@ixdy
Copy link
Member

ixdy commented Mar 3, 2016

For added fun, PR Jenkins also just fell over. Shaping out to be a great Thursday so far.

@ixdy
Copy link
Member

ixdy commented Mar 3, 2016

I think it might be jenkinsci/ghprb-plugin#273. It looked neat (it is supposed to cancel builds if a PR is updated, I think), but it looks like it's fundamentally broken:

WARNING: org.jenkinsci.plugins.ghprb.GhprbTrigger.run() failed for hudson.model.FreeStyleProject@7c2c4fa2[kubernetes
-pull-test-unit-integration]
java.lang.NullPointerException
        at org.jenkinsci.plugins.ghprb.extensions.build.GhprbCancelBuildsOnUpdate.cancelCurrentBuilds(GhprbCancelBui
ldsOnUpdate.java:44)
        at org.jenkinsci.plugins.ghprb.extensions.build.GhprbCancelBuildsOnUpdate.onScheduleBuild(GhprbCancelBuildsO
nUpdate.java:97)
        at org.jenkinsci.plugins.ghprb.GhprbTrigger.scheduleBuild(GhprbTrigger.java:296)
        at org.jenkinsci.plugins.ghprb.GhprbBuilds.build(GhprbBuilds.java:80)
        at org.jenkinsci.plugins.ghprb.GhprbPullRequest.build(GhprbPullRequest.java:325)
        at org.jenkinsci.plugins.ghprb.GhprbPullRequest.tryBuild(GhprbPullRequest.java:318)
        at org.jenkinsci.plugins.ghprb.GhprbPullRequest.check(GhprbPullRequest.java:161)
        at org.jenkinsci.plugins.ghprb.GhprbRepository.check(GhprbRepository.java:174)
        at org.jenkinsci.plugins.ghprb.GhprbRepository.check(GhprbRepository.java:154)
        at org.jenkinsci.plugins.ghprb.GhprbTrigger.run(GhprbTrigger.java:287)
        at hudson.triggers.Trigger.checkTriggers(Trigger.java:272)
        at hudson.triggers.Trigger$Cron.doRun(Trigger.java:221)
        at hudson.triggers.SafeTimerTask.run(SafeTimerTask.java:50)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecut
or.java:178)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java
:293)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)

@ixdy
Copy link
Member

ixdy commented Mar 3, 2016

I disabled that feature ("Cancel build on update") and triggered #21907 successfully, so we might be OK. Probably need to explicitly trigger everything with a pending status, though (at least the LG'd ones).

@ixdy
Copy link
Member

ixdy commented Mar 3, 2016

I re-queued all LGTM'd priority/P1 pending-status PRs. There were no priority/P0 PRs that I could find.

We probably need to re-queue the rest of the LGTM'd PRs at some point.

@ixdy
Copy link
Member

ixdy commented Mar 3, 2016

fyi @kubernetes/goog-testing (should've CC'd earlier)

@eparis
Copy link
Contributor Author

eparis commented Mar 3, 2016

I just hit up any PR with the 1.2 milestone that was hanging.

@ixdy
Copy link
Member

ixdy commented Mar 3, 2016

Jenkins seems to be working now, so closing.

@ixdy ixdy closed this as completed Mar 3, 2016
@imkin
Copy link

imkin commented Mar 3, 2016

#20851 is still not getting any love :-(

@eparis
Copy link
Contributor Author

eparis commented Mar 3, 2016

I will get it retriggered.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/test-infra priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now.
Projects
None yet
Development

No branches or pull requests

6 participants