-
Notifications
You must be signed in to change notification settings - Fork 40k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Kubelet: Switch to Docker engine-api and add timeout for all docker calls #23563
Comments
Nice to have this! 👍 |
Automatic merge from submit-queue Kubelet: Start using the official docker engine-api For #23563. This is the **first step** in the roadmap of switching to docker [engine-api](https://github.com/docker/engine-api). In this PR, I keep the old `DockerInterface` and implement it with the new engine-api. With this approach, we could switch to engine-api with minimum change, so that we could: * Test the engine-api without huge refactoring. * Send following PRs to refactor functions in `DockerInterface` separately so as to avoid a huge change in one PR. I've tested this PR locally, it passed all the node conformance test: ``` make test_e2e_node Ran 19 of 19 Specs in 823.395 seconds SUCCESS! -- 19 Passed | 0 Failed | 0 Pending | 0 Skipped PASS Ginkgo ran 1 suite in 13m49.429979585s Test Suite Passed ``` And it also passed the jenkins gce e2e test: ``` go run hack/e2e.go -test -v --test_args="--ginkgo.skip=\[Slow\]|\[Serial\]|\[Disruptive\]|\[Flaky\]|\[Feature:.+\]" Ran 161 of 268 Specs in 4570.214 seconds SUCCESS! -- 161 Passed | 0 Failed | 0 Pending | 107 Skipped PASS Ginkgo ran 1 suite in 1h16m16.325934558s Test Suite Passed 2016/03/25 15:12:42 e2e.go:196: Step 'Ginkgo tests' finished in 1h16m18.918754301s ``` I'm writing the design document, and will post the switching roadmap in an umbrella issue soon. @kubernetes/sig-node
Image related is ready in #23800, except |
Automatic merge from submit-queue Kubelet: Refactor container related functions in DockerInterface For #23563. Based on #23506, will rebase after #23506 is merged. The last 4 commits of this PR are new. This PR refactors all container lifecycle related functions in DockerInterface, including: * ListContainers * InspectContainer * CreateContainer * StartContainer * StopContainer * RemoveContainer @kubernetes/sig-node
Automatic merge from submit-queue Fix PullImage and add corresponding node e2e test Fixes #24101. This is a bug introduced by #23506, since ref #23563. The root cause of #24101 is described [here](#24101 (comment)). This PR 1) Fixes #24101 by decoding the messages returned during pulling image, and return error if any of the messages contains error. 2) Add the node e2e test to detect this kind of failure. 3) Get present check out of `ConformanceImage.Remove()` and `ConformanceImage.Pull()`. Because sometimes we may expect error to occur in `PullImage()` and `RemoveImage()`, but even that doesn't happen, the `Present()` check will still return error and let the test pass. @yujuhong @freehan @liangchenye Also /cc @resouer, because he is doing the image related functions refactoring.
I'd like to help with this effort. It looks like most of the work is done for the various endpoints, but I will look at the PRs and help review them. |
@MHBauer Thanks a lot! Really appreciate your help~ :) |
@Random-Liu - I think we need to set ShmSize now when using docker engine-api for create container. see: #24588 , let me know thoughts and I will send a PR. |
…ions Automatic merge from submit-queue Kubelet: Refactor all but image related functions in DockerInterface For #23563. Based on #23699 and #23844. Only last 3 commits are new. This PR refactored all functions except image related functions, including: * CreateExec * StartExec * InspectExec * AttachToContainer * Logs * Info * Version @kubernetes/sig-node
Automatic merge from submit-queue Refactor image related functions to use docker engine-api ref #23563 Hopes can do some help, cc @Random-Liu If it's ok, will add more work here.
Automatic merge from submit-queue Kubelet: Add docker operation timeout For #23563. Based on #24748, only the last 2 commits are new. This PR: 1) Add timeout for all docker operations. 2) Add docker operation timeout metrics 3) Cleanup kubelet stats and add runtime operation error and timeout rate monitoring. 4) Monitor runtime operation error and timeout rate in kubelet perf. @yujuhong /cc @gmarek Because of the metrics change. /cc @kubernetes/sig-node
I think this is done. |
@dchen1107 Yeah, the only thing left is "Add kubelet flag making the image pulling timeout configurable." @wojtek-t Do we still need that? If so, I can add it soon. :) |
…-flag Automatic merge from submit-queue Add runtime-request-timeout kubelet flag. XRef #23563. Addresses #27388 (comment). Add a new kubelet flag `runtime-request-timeout`, and set to 2 minutes by default. Now the flag only affects dockertools, rkt may also want to set request timeout according to the flag. @yifan-gu This PR also removed the timeout for all long running operations to avoid issues like #27588 and #26122. @yujuhong @rrati /cc @kubernetes/sig-node [![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/.github/PULL_REQUEST_TEMPLATE.md?pixel)]()
Automatic merge from submit-queue Kubelet: Set PruneChildren when removing image. This is a bug introduced during switching to engine-api. #23563. When removing image, there is an option `noprune`: ``` If prune is true, ancestor images will each attempt to be deleted quietly. ``` In go-dockerclient, the default value of the option is ["noprune=false"](https://github.com/fsouza/go-dockerclient/blob/master/image.go#L171), which means that ancestor images should be also removed. This is the expected behaviour. However in engine-api, the option is changed to `PruneChildren`, and the default value is `PruneChildren=false`, which means that ancestor images won't be removed. This makes `ImageRemove` only remove the first layer of the image, which causes the image garbage collection not working as expected. This should be fixed in 1.3. And thanks to @ronnielai for finding the bug! :) /cc @kubernetes/sig-node
From the first release of Kubernetes, we have been using go-dockerclient. It is reliable and updated frequently. However, we still met some problems when using it:
Docker engine-api is the official Golang client for the Docker remote api. It has cancellable operations(All the functions use Context), official support(The Docker CLI itself is relying on it) and the same feature set with Docker CLI. So, we decided to switch to docker engine-api.
DockerInterface
is an abstract interface containing a subset functions of go-dockerclient. Kubelet uses it whenever talking to docker. To switch to engine-api, we must refactor all the functions inDockerInterface
which will touch a lot of code.To avoid a huge refactoring in one PR, we'll split the refactoring into several steps:
DockerInterface
with the engine-api. [Kubelet: Start using the official docker engine-api #23506]/cc @kubernetes/sig-node
The text was updated successfully, but these errors were encountered: