Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Access host cgroup info within a container #13845

Closed
dchen1107 opened this issue Sep 11, 2015 · 14 comments
Closed

Access host cgroup info within a container #13845

dchen1107 opened this issue Sep 11, 2015 · 14 comments
Labels
area/api Indicates an issue on api area. area/downward-api area/isolation lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. priority/backlog Higher priority than priority/awaiting-more-evidence. sig/node Categorizes an issue or PR as relevant to SIG Node.

Comments

@dchen1107
Copy link
Member

Docker 1.8 now provides a container its own cgroups filesystem view (read-only) by default. But there is no way to view host's cgroup information to know the node resource usages. Once we really enforce QoS and overcommit the node, this feature might be important for those containers with Burstable class, so that the application itself could have finer control on their resource consumption to avoid eviction due to system Out-of-Resource condition.

There are many other ways to achieve this. I am just throwing this to kick out the discussion.

cc/ @lvlv @thockin @bgrant0607

@dchen1107 dchen1107 added area/api Indicates an issue on api area. area/isolation sig/node Categorizes an issue or PR as relevant to SIG Node. labels Sep 11, 2015
@dalanlan
Copy link
Contributor

/cc @dalanlan

@thockin
Copy link
Member

thockin commented Sep 11, 2015

Do they mount it in a way that /proc/self/cgroup's path field is correct?

I have, in the past, argued that this was an acceptable API. I'm less
confident now - back then I assumed people had LMCTFY to isolate themselves
from the vagaries of the kernel...

On Thu, Sep 10, 2015 at 5:31 PM, Dawn Chen notifications@github.com wrote:

Docker 1.8 now provides a container its own cgroups filesystem view
(read-only) by default. But there is no way to view host's cgroup
information to know the node resource usages. Once we really enforce QoS
and overcommit the node, this feature might be important for those
containers with Burstable class, so that the application itself could have
finer control on their resource consumption to avoid eviction due to system
Out-of-Resource condition.

There are many other ways to achieve this. I am just throwing this to kick
out the discussion.

cc/ @lvlv https://github.com/lvlv @thockin https://github.com/thockin
@bgrant0607 https://github.com/bgrant0607


Reply to this email directly or view it on GitHub
#13845.

@vishh
Copy link
Contributor

vishh commented Sep 11, 2015

@thockin: No. Docker mounts container cgroups at `/' within the container. I would prefer not exposing cgroups at all. One of the ideas discussed in #9473 was to make kubelet provide a read-only API which applications can use to get their resource limits.

@thockin
Copy link
Member

thockin commented Sep 11, 2015

LOL so I can't even use /proc/self/cgroup? Sigh...

Or did cgroup namespace get implemented in docker when I wasn't looking.

I prefer the idea of an API on a link-local IP

On Thu, Sep 10, 2015 at 10:41 PM, Vish Kannan notifications@github.com
wrote:

@thockin https://github.com/thockin: No. Docker mounts container
cgroups at `/' within the container. I would prefer not exposing cgroups at
all. One of the ideas discussed in #9473
#9473 was to make
kubelet provide a read-only API which applications can use to get their
resource limits.


Reply to this email directly or view it on GitHub
#13845 (comment)
.

@dchen1107
Copy link
Member Author

Kubelet provides an read-only API on a link-local IP sounds good to me, and when @lvlv and I talked this at Hangzhou conference two weeks ago, we proposed the same solution on this.

But I do have some concerns on providing such API: Get container's resource limit is straightforward, but exposing various cgroup stats through read-only API will introduce new complexities to the kubelet itself. Also are we going to rate limit API per container / pod? through token? policy on throttling? Caching API calls? I guess since we are going to standardize Kubelet API anyway, all above should be take care anyway.

@vishh
Copy link
Contributor

vishh commented Sep 11, 2015

cgroup namespaces hasn't been completed yet.

On Thu, Sep 10, 2015 at 11:43 PM, Dawn Chen notifications@github.com
wrote:

Kubelet provides an read-only API on a link-local IP sounds good to me,
and when @lvlv https://github.com/lvlv and I talked this at Hangzhou
conference two weeks ago, we proposed the same solution on this.

But I do have some concerns on providing such API: Get container's
resource limit is straightforward, but exposing various cgroup stats
through read-only API will introduce new complexities to the kubelet
itself. Also are we going to rate limit API per container / pod? through
token? policy on throttling? Caching API calls? I guess since we are going
to standardize Kubelet API anyway, all above should be take care anyway.


Reply to this email directly or view it on GitHub
#13845 (comment)
.

@dchen1107 dchen1107 added the priority/backlog Higher priority than priority/awaiting-more-evidence. label Sep 11, 2015
@bgrant0607
Copy link
Member

Also discussed some in #386.

In addition to the comments above, we have to be careful about what we expose from the node.

@vishh
Copy link
Contributor

vishh commented Sep 15, 2015

@dchen1107 shall we close this issue?

On Mon, Sep 14, 2015 at 11:51 PM, Brian Grant notifications@github.com
wrote:

Also discussed some in #386
#386.

In addition to the comments above, we have to be careful about what we
expose from the node.


Reply to this email directly or view it on GitHub
#13845 (comment)
.

@lvlv
Copy link
Contributor

lvlv commented Sep 15, 2015

I'm still hungry for such an API, it will be a great gateway for user
containers to detect the environment it is running, for example, it could
help containers optimize the locality in IO.

In our test, given the identity of the host, Hive could improve its
performance greatly (3x because of short-circuit read/write) when it could
detect the DataNode at the same node.

On Wed, Sep 16, 2015 at 12:12 AM, Vish Kannan notifications@github.com
wrote:

@dchen1107 shall we close this issue?

On Mon, Sep 14, 2015 at 11:51 PM, Brian Grant notifications@github.com
wrote:

Also discussed some in #386
#386.

In addition to the comments above, we have to be careful about what we
expose from the node.


Reply to this email directly or view it on GitHub
<
#13845 (comment)

.


Reply to this email directly or view it on GitHub
#13845 (comment)
.

@vishh
Copy link
Contributor

vishh commented Sep 15, 2015

@lvlv: #9473 has a detailed discussion.

@dchen1107
Copy link
Member Author

@vishh #9473 only talks about container itself's resource limit. For that, providing a readonly API is sufficient and based on the use pattern, it should be safe unless many bad users on a node queries its resource limits every short interval. This one is filed for a different use pattern, which on purpose to allow the user periodically query Kubelet since the usage changes quickly, thus, we need to think more about API call overhead here.

@fejta-bot
Copy link

Issues go stale after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Freeze the issue for 90d with /lifecycle frozen.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or @fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 14, 2017
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or @fejta.
/lifecycle rotten
/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jan 13, 2018
@fejta-bot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/api Indicates an issue on api area. area/downward-api area/isolation lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. priority/backlog Higher priority than priority/awaiting-more-evidence. sig/node Categorizes an issue or PR as relevant to SIG Node.
Projects
None yet
Development

No branches or pull requests

8 participants