Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Leverage docker labels to construct container statuses without consulting pod spec #15089

Closed
yujuhong opened this issue Oct 5, 2015 · 18 comments
Assignees
Labels
priority/backlog Higher priority than priority/awaiting-more-evidence. sig/node Categorizes an issue or PR as relevant to SIG Node.

Comments

@yujuhong
Copy link
Contributor

yujuhong commented Oct 5, 2015

Kubelet reconciles the difference between the pod status and the desired pod specs. When generating status for a pod (GetPodStatus), kubelet relies on several pieces of information from the apiserver:

  • PodSpec:
    • container name
    • termination message path
  • PodStatus:
    • container restart count

(edit: this list may be incomplete)

We can eliminate this dependency by writing the information as docker labels onto the container. The benefits of doing so include 1) kubelet can populate and maintain a container/pod status cache, representing what's running on the node, regardless of whether it has access to the pod spec yet or not; 2) this should simplify a lot of logic in pod spec.
(Note that rkt doesn't have this dependency as it writes relevant bits in the systemd unit file)

Alternatively, kubelet can store a checkpoint of pods assigned to itself, and load the checkpoint when kubelet restarts. We may need to do this eventually, but in the meantime, docker labels seem like a good idea.

The original docker label issue: #3764

@kubernetes/goog-node, WDYT?

@yujuhong yujuhong added the sig/node Categorizes an issue or PR as relevant to SIG Node. label Oct 5, 2015
@timothysc
Copy link
Member

/cc @kubernetes/rh-cluster-infra

@dchen1107
Copy link
Member

cc/ @taotaotheripper

Lantao, I am assigning this to you.

@dchen1107 dchen1107 added this to the v1.2-candidate milestone Oct 9, 2015
@dchen1107
Copy link
Member

cc/ @yifan-gu on RKT support side.

@yifan-gu
Copy link
Contributor

sgtm. FWIW, appc has similar concept called annotations Once we have apis to read them from rkt, we don't have to store everything in service files.

@Random-Liu
Copy link
Member

I just put restart count into docker label last week in #15965.
Next step, I'll try to move pod name and namespace into docker label, and use label filter to simplify GetPodStatus().
Just update my progress here, :)

@vishh
Copy link
Contributor

vishh commented Oct 26, 2015

@Random-Liu: Thanks for working on this! Looking forward to those patches.

@Random-Liu
Copy link
Member

@vishh Thank you, I'll keep working! :)

@Random-Liu
Copy link
Member

I put pod full name into docker label and add label filter in GetPodStatus() in #16414.
Again, just update my progress, :)

@Random-Liu
Copy link
Member

Progress Update
Currently, we can only add labels to newly created containers, but for containers started already, we can do nothing. So we can't do filter list or naming cleanup etc. which must ensure that all containers' labels have been set properly.

At least till now, the best solution is to add labels on running containers, but docker doesn't support that yet. I submitted an issue in docker community moby/moby#17457. And there is also related discussion moby/moby#15496. Hope that docker will add this feature soon.

Anyhow, for the next step, I'll keep on moving all the other related things into docker label. :)

@Random-Liu
Copy link
Member

To figure out whether docker list with label filter could improve the efficiency, I did a simple experiment.

Benchmark
Call client.ListContainers() of go-dockerclient 100 Times.

  • Total Container Number: 6101
  • Container Number (with label "test=1"): 100
  • Container Number (with label "single=1"): 1
  • Code (Really Simple)

Result

  • ListContainers()
    • each call time: 260.82 (ms)
    • total cpu time: 6.17 (s)
    • total time: 26.08 (s)
    • cpu profile
  • ListContainers() with filter "test=1"
    • each call time: 179.72 (ms)
    • total cpu time: 0.22 (s)
    • total time: 17.97 (s)
    • cpu profile
  • ListContainers() with filter "single=1"
    • each call time: 174.09 (ms)
    • total cpu time: 0.02 (s)
    • total time: 17.41 (s)
    • cpu profile

Related Code in Docker
api/server/router/local/local.go=>api/server/router/local/container.go#getContainerJSON() =>daemon/list.go#Containers()
Just list all the containers, then do filter.

Conclusion

  • On the docker client side, the main overhead is JSON decoding now. Using filter can reduce some of the cost.
  • At least till now, docker haven't done indexing for filtering, just list and filter.

@vishh
Copy link
Contributor

vishh commented Oct 31, 2015

At least till now, docker haven't done indexing for filtering, just list and filter.

I guess this was what we were assuming is true with labels. Is there an issue filed against docker to improve label based queries?

@Random-Liu
Copy link
Member

@vishh In the original proposal of docker label, they proposed indexing and performance. I just check the code to figure out whether they have done that, :)

@yujuhong
Copy link
Contributor Author

yujuhong commented Nov 2, 2015

As originally proposed, I think we should write TerminationMessagePath as a docker label, so that we don't need to rely on the pod spec to examine a terminated container.

@Random-Liu
Copy link
Member

OK. I'll do it. :)

@Random-Liu
Copy link
Member

Progress Update
The issue moby/moby#15496 about mutable container label attracted some attention these days, and "status/needs-attention" label has been added to it. Hope we'll have mutable container label soon, :)

@Random-Liu
Copy link
Member

Progress Update
Docker has started working on mutable label see here
And the original issue moby/moby#15496 has been marked as status/claimed.

@yujuhong
Copy link
Contributor Author

I believe we can close this one too. @Random-Liu, thanks for working on it!

@Random-Liu
Copy link
Member

@yujuhong Yeah. Maybe after moby/moby#18958 is landed, we can do something more. For now, I think we have done most of the things. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority/backlog Higher priority than priority/awaiting-more-evidence. sig/node Categorizes an issue or PR as relevant to SIG Node.
Projects
None yet
Development

No branches or pull requests

6 participants