Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kubelet: Switch to Docker engine-api and add timeout for all docker calls #23563

Closed
21 of 22 tasks
Random-Liu opened this issue Mar 28, 2016 · 7 comments
Closed
21 of 22 tasks
Assignees
Labels
area/docker area/kubelet priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. sig/node Categorizes an issue or PR as relevant to SIG Node.
Milestone

Comments

@Random-Liu
Copy link
Member

Random-Liu commented Mar 28, 2016

From the first release of Kubernetes, we have been using go-dockerclient. It is reliable and updated frequently. However, we still met some problems when using it:

Docker engine-api is the official Golang client for the Docker remote api. It has cancellable operations(All the functions use Context), official support(The Docker CLI itself is relying on it) and the same feature set with Docker CLI. So, we decided to switch to docker engine-api.

DockerInterface is an abstract interface containing a subset functions of go-dockerclient. Kubelet uses it whenever talking to docker. To switch to engine-api, we must refactor all the functions in DockerInterface which will touch a lot of code.
To avoid a huge refactoring in one PR, we'll split the refactoring into several steps:

/cc @kubernetes/sig-node

@Random-Liu Random-Liu self-assigned this Mar 28, 2016
@Random-Liu Random-Liu added area/docker area/kubelet sig/node Categorizes an issue or PR as relevant to SIG Node. labels Mar 28, 2016
@Random-Liu Random-Liu added this to the next-candidate milestone Mar 28, 2016
@Random-Liu Random-Liu changed the title Kubelet: Switch to Docker engine-api Kubelet: Switch to Docker engine-api and add timeout for all docker calls Mar 28, 2016
@dchen1107 dchen1107 modified the milestones: v1.3, next-candidate Mar 29, 2016
@resouer
Copy link
Contributor

resouer commented Apr 1, 2016

Nice to have this! 👍

k8s-github-robot pushed a commit that referenced this issue Apr 2, 2016
Automatic merge from submit-queue

Kubelet: Start using the official docker engine-api

For #23563.

This is the **first step** in the roadmap of switching to docker [engine-api](https://github.com/docker/engine-api).
In this PR, I keep the old `DockerInterface` and implement it with the new engine-api.
With this approach, we could switch to engine-api with minimum change, so that we could:
* Test the engine-api without huge refactoring.
* Send following PRs to refactor functions in `DockerInterface` separately so as to avoid a huge change in one PR.

I've tested this PR locally, it passed all the node conformance test:
```
make test_e2e_node

Ran 19 of 19 Specs in 823.395 seconds
SUCCESS! -- 19 Passed | 0 Failed | 0 Pending | 0 Skipped PASS

Ginkgo ran 1 suite in 13m49.429979585s
Test Suite Passed
```
And it also passed the jenkins gce e2e test:
```
go run hack/e2e.go -test -v --test_args="--ginkgo.skip=\[Slow\]|\[Serial\]|\[Disruptive\]|\[Flaky\]|\[Feature:.+\]"

Ran 161 of 268 Specs in 4570.214 seconds
SUCCESS! -- 161 Passed | 0 Failed | 0 Pending | 107 Skipped PASS

Ginkgo ran 1 suite in 1h16m16.325934558s
Test Suite Passed
2016/03/25 15:12:42 e2e.go:196: Step 'Ginkgo tests' finished in 1h16m18.918754301s
```

I'm writing the design document, and will post the switching roadmap in an umbrella issue soon.

@kubernetes/sig-node
@resouer
Copy link
Contributor

resouer commented Apr 7, 2016

Image related is ready in #23800, except PullImage

k8s-github-robot pushed a commit that referenced this issue Apr 17, 2016
Automatic merge from submit-queue

Kubelet: Refactor container related functions in DockerInterface

For #23563.
Based on #23506, will rebase after #23506 is merged.

The last 4 commits of this PR are new.
This PR refactors all container lifecycle related functions in DockerInterface, including:
* ListContainers
* InspectContainer
* CreateContainer
* StartContainer
* StopContainer
* RemoveContainer

@kubernetes/sig-node
k8s-github-robot pushed a commit that referenced this issue Apr 18, 2016
Automatic merge from submit-queue

Fix PullImage and add corresponding node e2e test

Fixes #24101. This is a bug introduced by #23506, since ref #23563.

The root cause of #24101 is described [here](#24101 (comment)).

This PR
1) Fixes #24101 by decoding the messages returned during pulling image, and return error if any of the messages contains error.
2) Add the node e2e test to detect this kind of failure.
3) Get present check out of `ConformanceImage.Remove()` and `ConformanceImage.Pull()`. Because sometimes we may expect error to occur in `PullImage()` and `RemoveImage()`, but even that doesn't happen, the `Present()` check will still return error and let the test pass.

@yujuhong @freehan @liangchenye 

Also /cc @resouer, because he is doing the image related functions refactoring.
@MHBauer
Copy link
Contributor

MHBauer commented Apr 20, 2016

I'd like to help with this effort. It looks like most of the work is done for the various endpoints, but I will look at the PRs and help review them.

@Random-Liu
Copy link
Member Author

@MHBauer Thanks a lot! Really appreciate your help~ :)

@derekwaynecarr
Copy link
Member

@Random-Liu - I think we need to set ShmSize now when using docker engine-api for create container. see: #24588 , let me know thoughts and I will send a PR.

k8s-github-robot pushed a commit that referenced this issue Apr 22, 2016
…ions

Automatic merge from submit-queue

Kubelet: Refactor all but image related functions in DockerInterface

For #23563.
Based on #23699 and #23844.

Only last 3 commits are new. This PR refactored all functions except image related functions, including:
* CreateExec
* StartExec
* InspectExec
* AttachToContainer
* Logs
* Info
* Version

@kubernetes/sig-node
k8s-github-robot pushed a commit that referenced this issue Apr 24, 2016
Automatic merge from submit-queue

Refactor image related functions to use docker engine-api

ref #23563 

Hopes can do some help, cc @Random-Liu 

If it's ok, will add more work here.
k8s-github-robot pushed a commit that referenced this issue May 6, 2016
Automatic merge from submit-queue

Kubelet: Cleanup with new engine api

Finish step 2 of #23563

This PR:
1) Cleanup go-dockerclient reference in the code.
2) Bump up the engine-api version.
3) Cleanup the code with new engine-api.

Fixes #24076.
Fixes #23809.

/cc @yujuhong
k8s-github-robot pushed a commit that referenced this issue May 10, 2016
Automatic merge from submit-queue

Kubelet: Add docker operation timeout

For #23563.
Based on #24748, only the last 2 commits are new.

This PR:
1) Add timeout for all docker operations.
2) Add docker operation timeout metrics
3) Cleanup kubelet stats and add runtime operation error and timeout rate monitoring.
4) Monitor runtime operation error and timeout rate in kubelet perf.

@yujuhong 
/cc @gmarek Because of the metrics change.
/cc @kubernetes/sig-node
@dchen1107
Copy link
Member

I think this is done.

@dchen1107 dchen1107 added the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label May 17, 2016
@Random-Liu
Copy link
Member Author

@dchen1107 Yeah, the only thing left is "Add kubelet flag making the image pulling timeout configurable." @wojtek-t Do we still need that? If so, I can add it soon. :)

k8s-github-robot pushed a commit that referenced this issue Jun 21, 2016
…-flag

Automatic merge from submit-queue

Add runtime-request-timeout kubelet flag.

XRef #23563.

Addresses #27388 (comment).

Add a new kubelet flag `runtime-request-timeout`, and set to 2 minutes by default.
Now the flag only affects dockertools, rkt may also want to set request timeout according to the flag. @yifan-gu 

This PR also removed the timeout for all long running operations to avoid issues like #27588 and #26122.

@yujuhong @rrati 
/cc @kubernetes/sig-node 

[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/.github/PULL_REQUEST_TEMPLATE.md?pixel)]()
k8s-github-robot pushed a commit that referenced this issue Jul 20, 2016
Automatic merge from submit-queue

Kubelet: Set PruneChildren when removing image.

This is a bug introduced during switching to engine-api. #23563.

When removing image, there is an option `noprune`:
```
If prune is true, ancestor images will each attempt to be deleted quietly.
```

In go-dockerclient, the default value of the option is ["noprune=false"](https://github.com/fsouza/go-dockerclient/blob/master/image.go#L171), which means that ancestor images should be also removed. This is the expected behaviour.

However in engine-api, the option is changed to `PruneChildren`, and the default value is `PruneChildren=false`, which means that ancestor images won't be removed.
This makes `ImageRemove` only remove the first layer of the image, which causes the image garbage collection not working as expected.

This should be fixed in 1.3.
And thanks to @ronnielai for finding the bug! :)

/cc @kubernetes/sig-node
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/docker area/kubelet priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. sig/node Categorizes an issue or PR as relevant to SIG Node.
Projects
None yet
Development

No branches or pull requests

5 participants