Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

git checkout fails with "reference is not a tree" #27462

Closed
ikehz opened this issue Jun 15, 2016 · 24 comments
Closed

git checkout fails with "reference is not a tree" #27462

ikehz opened this issue Jun 15, 2016 · 24 comments
Assignees
Labels
area/test-infra kind/flake Categorizes issue or PR as related to a flaky test. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.

Comments

@ikehz
Copy link
Contributor

ikehz commented Jun 15, 2016

I see the following in Jenkins verification of #27453:

hudson.plugins.git.GitException: Command "/usr/bin/git checkout -f 6676629a96bf8631fbe106ff5ad147527359cbcc" returned status code 128:
stdout: 
stderr: fatal: reference is not a tree: 6676629a96bf8631fbe106ff5ad147527359cbcc

Artifacts at https://pantheon.corp.google.com/storage/browser/kubernetes-jenkins/pr-logs/pull/27453/kubernetes-pull-verify-all/140

@ikehz ikehz added area/test-infra kind/flake Categorizes issue or PR as related to a flaky test. labels Jun 15, 2016
@fejta
Copy link
Contributor

fejta commented Jun 15, 2016

@mikedanese can you triage? Is this related to the new job somehow?

@mikedanese
Copy link
Member

mikedanese commented Jun 15, 2016

Ok, it's also happening on unit/integration on that pr

https://pantheon.corp.google.com/storage/browser/kubernetes-jenkins/pr-logs/pull/27453/kubernetes-pull-test-unit-integration/30794

Could be something that only happens with cherrypicks. It's strange because the unit/integration tests did not change at all.

@spxtr
Copy link
Contributor

spxtr commented Jun 16, 2016

edit: What I describe below is true and can happen, but most of our cases are caused by a race condition in the Jenkins plugin and aren't caused by anyone force-pushing.

Oh! I think I remember this one. I think what happens is this:

  1. You open a PR whose head has sha abcde123.
  2. Jenkins queues up a build with that sha, but doesn't run it yet.
  3. You force-push over that commit, so the head of your PR is now something else.
  4. Jenkins tries to checkout the old sha, which no longer exists.

It might not be exactly that, since we actually try to check out the merge commit, but I think that's the idea.

@wojtek-t
Copy link
Member

@spxtr - what you wrote sounds reasonable and it seems that we can't do much with it. Can we close this one?

@spxtr
Copy link
Contributor

spxtr commented Jun 17, 2016

I think so, or maybe remove the kind/flake label and call it a P3 to try to detect this and fail with a more reasonable message.

@yujuhong
Copy link
Contributor

There was no push in #27913, but it still encountered the same problem. Reopening the issue.

@yujuhong yujuhong reopened this Jun 23, 2016
@fejta
Copy link
Contributor

fejta commented Jun 24, 2016

Check out #28021 I am pretty sure at least part of what's happening here is that the plugin isn't designed to handle multiple pulls at the same time.

So PR 28015 actually winds up trying to pull the merge for 28018.

https://test-dot-k8s-gubernator.appspot.com/build/kubernetes-jenkins/pr-logs/pull/28015/kubernetes-pull-build-test-e2e-gce/46566/

Why are we trying to pull origin/pr/28018/merge for PR 28015???

cc @ixdy

@ixdy
Copy link
Member

ixdy commented Jul 5, 2016

So it looks pretty clear that the GitHub PR builder plugin doesn't handle multiple executors very well. We try to check out origin/pr/${ghprbPullId}/merge, but increasingly frequently this is set to a different PR that we're testing at the same time.

There was a rework of some of the envvar code recently, and I tried upgrading the plugin to see if that fixed the issue, but I then ran into a different as-of-yet undiagnosed issue where the plugin was failing to schedule any jobs at all.

I'm noticing that jenkinsci/ghprb-plugin@9c46eab merged since then, and could very likely explain the more recent issue. So I might try upgrading the plugin again.

@ixdy
Copy link
Member

ixdy commented Jul 6, 2016

Yeah, jenkinsci/ghprb-plugin#366 is almost certainly the more recent issue.

@ixdy
Copy link
Member

ixdy commented Jul 6, 2016

Upgraded the plugin, and tests are running again. Let's see if we keep having issues with the checkouts.

@ixdy
Copy link
Member

ixdy commented Jul 6, 2016

Latest version of the plugin (1.32.8) is broken in a new and fun way:

$ while true; do curl -slL https://api.github.com/rate_limit?access_token=[elided] | jq .resources.core.remaining ; sleep 1; done
592
591
589
587
586
584
582
575
564
553
545
535
529
519
509
500
491
483
478
476
473

Looking at the logs, it seems to be repeatedly querying PRs without abandon; I see lines like

Jul 05, 2016 10:05:04 PM INFO org.jenkinsci.plugins.ghprb.GhprbPullRequest updatePR
Pull request #16,062 was updated on repo kubernetes/kubernetes but there aren't any new comments nor commits; that may mean that commit status was updated.
Jul 05, 2016 10:05:04 PM INFO org.jenkinsci.plugins.ghprb.GhprbPullRequest updatePR
Pull request #13,925 was updated on repo kubernetes/kubernetes but there aren't any new comments nor commits; that may mean that commit status was updated.
Jul 05, 2016 10:05:04 PM INFO org.jenkinsci.plugins.ghprb.GhprbPullRequest updatePR
Pull request #13,216 was updated on repo kubernetes/kubernetes but there aren't any new comments nor commits; that may mean that commit status was updated.

repeating a few times per PR per minute.

Downgrading back to 1.31.4 again...

@spxtr
Copy link
Contributor

spxtr commented Aug 26, 2016

BTW this is a problem with the Git plugin, not the GitHub PR Builder plugin.

@ixdy
Copy link
Member

ixdy commented Aug 26, 2016

What makes you say that? From a recent run:

GitHub pull request #31064 of commit 12c248315f8191396574085591e1d749ab0fe670, no merge conflicts.
...
 > /usr/bin/git -c core.askpass=true fetch --tags --progress https://github.com/kubernetes/kubernetes +refs/pull/31064/merge:refs/remotes/origin/pr/31064/merge # timeout=20
...
Checking out Revision da5feb577274a5f7b4bb31e9f4bd655b0f143e64 (origin/pr/31373/merge)
 > /usr/bin/git config core.sparsecheckout # timeout=10
 > /usr/bin/git checkout -f da5feb577274a5f7b4bb31e9f4bd655b0f143e64 # timeout=20
FATAL: Could not checkout da5feb577274a5f7b4bb31e9f4bd655b0f143e64

Note how the branch being checked out suddenly changed. I assume that is the GitHub PR Builder plugin's fault, not git.

@spxtr
Copy link
Contributor

spxtr commented Aug 26, 2016

I hit this issue in a test job that didn't use the PR builder plugin, only the Git plugin. I deleted the test job and now I can't find the logs.

The PR builder plugin just triggers the jobs with the appropriate parameters. It doesn't actually do the checkout.

@spxtr spxtr changed the title Verify fails with "reference is not a tree" git checkout fails with "reference is not a tree" Aug 26, 2016
@k8s-github-robot
Copy link

[FLAKE-PING] @ixdy @spxtr

This flaky-test issue would love to have more attention.

@ixdy
Copy link
Member

ixdy commented Sep 1, 2016

@spxtr maybe it's a bug in Jenkins' parameter handling, then?

@ixdy
Copy link
Member

ixdy commented Sep 1, 2016

anyway, once we get rid of the GHPRB plugin, we can update Jenkins and see if the issue goes away.

@spxtr
Copy link
Contributor

spxtr commented Sep 2, 2016

Agreed. If not then Erick is working on getting rid of the Git plugin.

@k8s-github-robot
Copy link

[FLAKE-PING] @ixdy @spxtr

This flaky-test issue would love to have more attention.

@ixdy
Copy link
Member

ixdy commented Sep 6, 2016

@rmmh
Copy link
Contributor

rmmh commented Sep 9, 2016

It's the Git plugin. I've patched it and installed the fixed version on pr-jenkins last night, and haven't observed any failures since then. Normally there are ~20 failures per day.

See JENKINS-26290 for the full details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/test-infra kind/flake Categorizes issue or PR as related to a flaky test. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.
Projects
None yet
Development

No branches or pull requests