Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jenkins GCE Node e2e failed for no kubernetes dir #23873

Closed
adohe-zz opened this issue Apr 5, 2016 · 17 comments
Closed

Jenkins GCE Node e2e failed for no kubernetes dir #23873

adohe-zz opened this issue Apr 5, 2016 · 17 comments
Assignees
Labels
area/test-infra priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now.

Comments

@adohe-zz
Copy link

adohe-zz commented Apr 5, 2016

FYI. I just noticed this when I am working with several PRs, #23400, #23521, #23473 I think this needs to be fixed asap.

@caesarxuchao
Copy link
Member

Is this a flake? If so, could you please add "e2e flake: ..." in the title? That would help people find it.

Also, could you please paste the snippet of the log and the link to the failed tests? Thank you.

@ixdy
Copy link
Member

ixdy commented Apr 5, 2016

@pwittrock has anything changed with the PR node e2e job recently?

@ixdy
Copy link
Member

ixdy commented Apr 5, 2016

@kubernetes/goog-testing as fyi

@pmorie
Copy link
Member

pmorie commented Apr 5, 2016

I don't think it's a flake -- it seems to be happening in all PRs. Example: #23435

@ixdy
Copy link
Member

ixdy commented Apr 5, 2016

In a build yesterday, the first few lines in the build log showed

Cloning the remote Git repository
Cloning repository https://github.com/kubernetes/kubernetes
 > /usr/bin/git init /var/lib/jenkins/workspace/node-pull-build-e2e-test@2/go/src/k8s.io/kubernetes # timeout=10

Today, it's instead showing

Cloning repository https://github.com/kubernetes/kubernetes
 > /usr/bin/git init /var/lib/jenkins/workspace/node-pull-build-e2e-test # timeout=10
Fetching upstream changes from https://github.com/kubernetes/kubernetes

I'm not sure where the go/src/k8s.io/kubernetes went.

@pwittrock
Copy link
Member

Looks like the jenkins config line to check out to a subdirectory was deleted. I added it back and things seem to be getting farther. Need to get the config into jjb so we have a reasonable way to track changes to the system.

@pmorie
Copy link
Member

pmorie commented Apr 5, 2016

+100 for JJB

@ixdy
Copy link
Member

ixdy commented Apr 5, 2016

I think I've traced the root cause. Spoiler: it's another bug in Jenkins (or one of its plugins)!

Basically, for this job, we've added the Check out to a sub-directory option to the Git SCM config, specifying go/src/k8s.io/kubernetes as the subdir. Saving the config seems to update the in-memory configuration for the job, but the config.xml file for the job on disk has no record of this setting.

Coincidentally, the PR Jenkins VM restarted for some reason around 7:30pm PDT yesterday. (No idea why, but it's sadly not uncommon.) When Jenkins came back, it lost the configuration parameter, since it wasn't stored in the disk-backed config.

We can expect this to break basically any time Jenkins restarts, now. Adding the config to JJB will help a bit (it should keep the in-memory version correct), though there may still be a short period where it'll fail between Jenkins starting and the config updater job running.

@ixdy
Copy link
Member

ixdy commented Apr 5, 2016

Aha - from https://wiki.jenkins-ci.org/display/JENKINS/Git+Plugin:

Version 2.4.4 (Mar 24, 2016)

We're currently running version 2.4.3 of the plugin, which explains the data loss. (Everything is terrible.)

@ixdy ixdy self-assigned this Apr 5, 2016
@ixdy ixdy added priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. area/test-infra labels Apr 5, 2016
@spiffxp
Copy link
Member

spiffxp commented Apr 5, 2016

/cc @kubernetes/sig-testing

@ixdy
Copy link
Member

ixdy commented Apr 5, 2016

Upgraded PR Jenkins to version 2.4.4 of the git plugin. The job's config.xml now contains the correct settings, too.

1 similar comment
@ixdy
Copy link
Member

ixdy commented Apr 5, 2016

Upgraded PR Jenkins to version 2.4.4 of the git plugin. The job's config.xml now contains the correct settings, too.

@ixdy
Copy link
Member

ixdy commented Apr 8, 2016

A bunch of the post-commit node e2e jobs have started failing because I restarted the Jenkins VM, and bad versions of these configs had been written with version 2.4.3 of the git plugin.

Unfortunately, the Jenkins job updater did not fix the problem. It seems that the XML it's producing looks like

  <scm class="hudson.plugins.git.GitSCM">
  ...
    <submoduleCfg class="list"/>
    <relativeTargetDir>go/src/k8s.io/kubernetes</relativeTargetDir>
    <reference/>

whereas the expected XML (looking at the Jenkins VM) looks like

  <scm class="hudson.plugins.git.GitSCM" plugin="git@2.4.4">
  ...
    <extensions>
      <hudson.plugins.git.extensions.impl.RelativeTargetDirectory>
        <relativeTargetDir>go/src/k8s.io/kubernetes</relativeTargetDir>
      </hudson.plugins.git.extensions.impl.RelativeTargetDirectory>
    </extensions>

We're using a pretty old version of the Jenkins job builder (1.3.1 - current is 1.4.1). However, it appears that the newest version still produces incorrect output, so we probably need to fix this.

@spxtr

@ixdy
Copy link
Member

ixdy commented Apr 8, 2016

I manually fixed kubelet-gce-e2e-ci in the Jenkins UI (hopefully it doesn't get reverted by the updater). I haven't touched the others yet.

@spxtr
Copy link
Contributor

spxtr commented Apr 8, 2016

Raw XML time :(

@gmarek
Copy link
Contributor

gmarek commented Apr 8, 2016

It this a reason why we're seeing 111s from gcloud calls all around the place?

@ixdy
Copy link
Member

ixdy commented Apr 8, 2016

I think the 111s are unrelated, due to the Jenkins VM being overloaded.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/test-infra priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now.
Projects
None yet
Development

No branches or pull requests

8 participants