Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not throw creation errors for containers that fail immediately after being started #23894

Merged
merged 2 commits into from
Apr 15, 2016

Conversation

vishh
Copy link
Contributor

@vishh vishh commented Apr 5, 2016

Fixes (hopefully) #23607

cc @dchen1107

@vishh
Copy link
Contributor Author

vishh commented Apr 5, 2016

@dchen1107 The node e2e test that I added should validate this fix. If you think this is worth fixing in v1.2, kindly add the necessary labels.

@k8s-github-robot k8s-github-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. release-note-label-needed labels Apr 6, 2016
@k8s-bot
Copy link

k8s-bot commented Apr 6, 2016

GCE e2e build/test passed for commit 2703b934caba019b52808ea5e896b64b85f32269.

@vishh vishh force-pushed the oom-score-adj-error branch from 2703b93 to 502f438 Compare April 6, 2016 23:50
@vishh vishh changed the title Handle immediately failing containers in kubelet Do not throw creation errors for containers that fail immediately after being started Apr 6, 2016
@vishh vishh added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed release-note-label-needed labels Apr 6, 2016
@vishh vishh assigned timstclair and dchen1107 and unassigned brendandburns and timstclair Apr 6, 2016
@dchen1107
Copy link
Member

Why do you include commit of adding an option to focus or skip tests in the node e2e makefile rule to this pr?

@k8s-bot
Copy link

k8s-bot commented Apr 7, 2016

GCE e2e build/test passed for commit 502f43884a68ff16955591e34d9c747326f9191f.

@vishh vishh force-pushed the oom-score-adj-error branch from 502f438 to 7937d29 Compare April 7, 2016 17:22
@vishh
Copy link
Contributor Author

vishh commented Apr 7, 2016

@dchen1107: The commit with make file changes has been reverted.

@vishh vishh force-pushed the oom-score-adj-error branch from 7937d29 to 9590050 Compare April 7, 2016 17:52
@k8s-bot
Copy link

k8s-bot commented Apr 7, 2016

GCE e2e build/test passed for commit 7937d29a7c7a5f9f6c7dc15a63247f5dcf4c9fe7.

@k8s-bot
Copy link

k8s-bot commented Apr 7, 2016

GCE e2e build/test passed for commit 9590050823054dced293f0407597d774bb28a9f7.

@ncdc
Copy link
Member

ncdc commented Apr 8, 2016

@vishh need to do the same os.PathError check here:

if e, ok := err.(*os.SyscallError); ok && os.IsNotExist(e) {

Although I'm curious - why check the error type at all? Why not just call os.IsNotExist?

@vishh vishh force-pushed the oom-score-adj-error branch from 9590050 to b1da4ec Compare April 8, 2016 22:28
@vishh
Copy link
Contributor Author

vishh commented Apr 8, 2016

@ncdc Good suggestion. Updated the code.

@vishh vishh assigned ncdc and unassigned dchen1107 Apr 8, 2016
@k8s-bot
Copy link

k8s-bot commented Apr 8, 2016

GCE e2e build/test passed for commit b1da4ecf4a2887e230afa60a08e809e0f0bff505.

err = fmt.Errorf("failed to apply oom-score-adj to pid %d (%v)", pid, err)
continue
}
if _, err := f.Write([]byte(value)); err != nil {
if syscallNotExists(err) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible for the Open() to succeed but the Write to fail with a "file not found" error?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is proc fs I'm not sure what the semantics are. I can look it up.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So a simple test on bash seems to be throwing ESRCH. Good catch...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Catchin ESRCH requires string parsing. Given that we will most likely retry if the write fails, which will trigger Open(..) subsequently, I don't think we need to worry about write errors.
There is still a chance that we might not have enough retries, but the likelihood is less and I'd avoid spending more brain power on this.

@k8s-bot
Copy link

k8s-bot commented Apr 11, 2016

GCE e2e build/test passed for commit d4db40e.

@ncdc
Copy link
Member

ncdc commented Apr 11, 2016

@vishh

Failure [120.042 seconds]
Container Conformance Test
/var/lib/jenkins/workspace/node-pull-build-e2e-test/go/src/k8s.io/kubernetes/test/e2e_node/conformance_test.go:166
  container conformance blackbox test
  /var/lib/jenkins/workspace/node-pull-build-e2e-test/go/src/k8s.io/kubernetes/test/e2e_node/conformance_test.go:165
    when testing images that exist
    /var/lib/jenkins/workspace/node-pull-build-e2e-test/go/src/k8s.io/kubernetes/test/e2e_node/conformance_test.go:81
      it should remove successfully [Conformance] [It]
      /var/lib/jenkins/workspace/node-pull-build-e2e-test/go/src/k8s.io/kubernetes/test/e2e_node/conformance_test.go:80

      Expected error:
          <*errors.errorString | 0xc8201508a0>: {
              s: "Error response from daemon: Conflict, cannot delete b175bcb79023 because the container 7ac70f4b7777 is using it, use -f to force",
          }
          Error response from daemon: Conflict, cannot delete b175bcb79023 because the container 7ac70f4b7777 is using it, use -f to force
      not to have occurred

      /var/lib/jenkins/workspace/node-pull-build-e2e-test/go/src/k8s.io/kubernetes/test/e2e_node/conformance_test.go:76

@vishh
Copy link
Contributor Author

vishh commented Apr 11, 2016

@ncdc container conformance blackbox test is incorrect. It is using a shared image :( I need to fix the test separately first.

@vishh
Copy link
Contributor Author

vishh commented Apr 11, 2016

Posted #24110 to fix the e2e flakiness.

@ncdc
Copy link
Member

ncdc commented Apr 12, 2016

@k8s-bot test this github issue: #24110

@vishh
Copy link
Contributor Author

vishh commented Apr 12, 2016

@k8s-bot test node e2e experimental

@k8s-bot
Copy link

k8s-bot commented Apr 12, 2016

GCE e2e build/test passed for commit d4db40e.

@ncdc ncdc added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 12, 2016
@k8s-github-robot
Copy link

@k8s-bot test this

Tests are more than 48 hours old. Re-running tests.

@ncdc ncdc added this to the v1.2 milestone Apr 14, 2016
@k8s-bot
Copy link

k8s-bot commented Apr 14, 2016

GCE e2e build/test failed for commit d4db40e.

Please reference the list of currently known flakes when examining this failure. If you request a re-test, you must reference the issue describing the flake.

@ncdc
Copy link
Member

ncdc commented Apr 15, 2016

e2e failed because GCE cluster failed to start:

Creating new network: e2e-gce-builder-2-2
WARNING: You are creating a legacy network. Using --mode=legacy will be required in future releases.
ERROR: (gcloud.compute.networks.create) Some requests did not succeed:
 - The resource 'projects/kubernetes-jenkins-pull/global/networks/e2e-gce-builder-2-2' already exists
2016/04/14 13:09:28 e2e.go:200: Error running up: exit status 1

@k8s-bot e2e test this github issue: #IGNORE

@k8s-bot
Copy link

k8s-bot commented Apr 15, 2016

GCE e2e build/test passed for commit d4db40e.

@k8s-github-robot
Copy link

@k8s-bot test this [submit-queue is verifying that this PR is safe to merge]

@k8s-bot
Copy link

k8s-bot commented Apr 15, 2016

GCE e2e build/test passed for commit d4db40e.

@k8s-github-robot
Copy link

Automatic merge from submit-queue

@k8s-github-robot k8s-github-robot merged commit 2382fec into kubernetes:master Apr 15, 2016
@vishh vishh added the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label Apr 15, 2016
@roberthbailey roberthbailey added the cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. label Apr 19, 2016
@roberthbailey
Copy link
Contributor

@vishh Please create a cherry pick pull for this PR.

@vishh
Copy link
Contributor Author

vishh commented Apr 19, 2016

#24485

On Tue, Apr 19, 2016 at 7:44 AM, Robert Bailey notifications@github.com
wrote:

@vishh https://github.com/vishh Please create a cherry pick pull for
this PR.


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#23894 (comment)

zmerlynn added a commit that referenced this pull request Apr 20, 2016
…upstream-release-1.2

Automated cherry pick of #23894 upstream release 1.2
@zmerlynn zmerlynn added cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. and removed cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. cherrypick-candidate labels Apr 20, 2016
shyamjvs pushed a commit to shyamjvs/kubernetes that referenced this pull request Dec 1, 2016
…of-#23894-upstream-release-1.2

Automated cherry pick of kubernetes#23894 upstream release 1.2
shouhong pushed a commit to shouhong/kubernetes that referenced this pull request Feb 14, 2017
…of-#23894-upstream-release-1.2

Automated cherry pick of kubernetes#23894 upstream release 1.2
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.