Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pod-level security context proposal #12823

Merged
merged 1 commit into from
Sep 25, 2015
Merged

Conversation

pmorie
Copy link
Member

@pmorie pmorie commented Aug 17, 2015

Proposal for differentiating pod-level and container-level security contexts.

@bgrant0607 @erictune @smarterclayton @pweil- @eparis

@pmorie pmorie changed the title WIP: pod-level security context WIP: pod-level security context proposal Aug 17, 2015
@pmorie pmorie force-pushed the pod-sc branch 2 times, most recently from 67b8fbe to 24be8ad Compare August 17, 2015 21:03
@pmorie
Copy link
Member Author

pmorie commented Aug 17, 2015

How could I forget @thockin

@k8s-bot
Copy link

k8s-bot commented Aug 17, 2015

GCE e2e build/test passed for commit 769ea856800b247d06e7cc0623405e5ddaea2642.

@pmorie
Copy link
Member Author

pmorie commented Aug 17, 2015

That's what i thought @k8s-bot 👊

@k8s-bot
Copy link

k8s-bot commented Aug 17, 2015

GCE e2e build/test passed for commit 67b8fbe168aa677e9c830a7535900f8b59ecc97f.

@k8s-bot
Copy link

k8s-bot commented Aug 17, 2015

GCE e2e build/test passed for commit 24be8ad87e50bb4611c725679d01e6978c061241.

Privileged *bool
SELinuxOptions *SELinuxOptions
RunAsUser *int64
RunAsNonRoot bool
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are there two flags for non-root? can we get away with just one?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question -- they are two different facets:

  1. RunAsUser is the specific UID that is requested
  2. RunAsNonRoot is to ensure that when the uid is delegated to what's declared in the image metadata, that the image doesn't want to run as root.

If I had included the comments on those fields it would probably be clearer.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments and names might make this clearer.

@pmorie
Copy link
Member Author

pmorie commented Aug 18, 2015

@thockin @bgrant0607 What's the backward-compat story for this going to be?

@bgrant0607
Copy link
Member

We must support the current v1 API for the foreseeable future.


#### Container runtime changes

The docker and rkt `Runtime` implementations should be changed to set the group of each container
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bgrant0607
Copy link
Member

I like it.

As when we introduced SecurityContext, we need to continue to support the current fields (except anything added since 1.0, such as hostIPC). Any duplicated fields need to be set from each other in conversion (it can't be done in defaulting because that causes problems for updates).

@bgrant0607 bgrant0607 added the area/api Indicates an issue on api area. label Aug 19, 2015
@bgrant0607
Copy link
Member

ContainerDefaults in the PodSecurityContext is problematic. If we copy it to ContainerSecurityContext of each container, that causes problems for updates -- we'd at least have to do something special. If we don't copy it, then we break v1 API clients expecting to be able to read the securityContext on each container.

@pmorie
Copy link
Member Author

pmorie commented Aug 19, 2015

@bgrant0607 Agree it is problematic. IMO backward compat is going to be the trickiest aspect of actually coding this. By 'something special', I imagine you mean that for updates, we will have to project the container security contexts into the container defaults specified at the pod level. Accurate?

@bgrant0607
Copy link
Member

@pmorie:

In addition to updates, we also need to think about reading versions of the resource from etcd that were written prior to the field being added.

It's general API policy to explicitly expand all default values so that client and other code dealing with the API don't all need to reason about implicit default behaviors. However, if a user originally set just one field that was copied to another field and then tried to update the original field, the two fields would then not match. What we'd really like is spreadsheet-like behavior, where updating any equivalent field updates the others automagically.

This doesn't exactly match what we do for defaulting, since that doesn't touch fields that are already set. It is possible to do during conversion, by treating one field in the internal representation as the source and others as mirrors. If/when we eliminate the internal representation we'd need to find another solution.

The other problem is how to tell which field was changed in the case that the two fields are set inconsistently. If PATCH, it should be straightforward. If PUT, then we'd need to diff the resource with the previous version.

With ContainerDefaults, it's even harder. If we copied ContainerDefaults into the ContainerSecurityContext of each container that didn't specify SecurityContext, we probably should not allow ContainerDefaults to be updated, since doing so would have no effect, unless the user explicitly nulled the ContainerSecurityContext fields again.

In this case, maybe we need to assume that anyone writing PodSecurityContext knows what they're doing (e.g., don't write PodSecurityContext and then just update the deprecated HostNetwork field without updating PodSecurityContext.HostNetwork), and we should just try to make reading the existing container-level securityContext work in a backward-compatible manner.

The other alternative is that we just make this better in the v2 API, and comment which fields are relevant to pod-level security. PodSecurityContext doesn't address host ports nor host paths, either.

Is this really worth the effort now?

two additions:

1. The `RunAsGroup` field specifies the GID the container process runs as
2. The `RunWithSupplementalGroups` field specifies additional groups the container process should
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the use cases for this? Is kubelet expected to verify that all the supplemental groups exist on the system?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vishh The use-case is to enable sharing volumes via supplemental groups. The supplemental groups don't need to exist in /etc/groups inside or outside the container -- since they're just IDs, not strings that have to be resolved by looking at the groups file. I'm preparing another PR for a volumes proposal that will go in-depth into their use.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. Looking forward to your proposal. Kubelet might end up using gids for disk management internally. To avoid conflicts, we might have to reserve ranges of gids.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vishh That's basically what I'm going to propose.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opened #12944 for volume stuff

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know that supplemental groups need to be part of the API. If RunAsGroup is specified in the pod's security context then the volumes can be owned by that GID. If it is not specified then the kubelet should generate and use supplemental group IDs in the background to solve the volume permissions problem. This has the added benefit of being able to solve the volume permissions problem without a need for API changes :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hrm - if there is a use case for common supplemental groups across
different pods on shared volumes we'll still need a field.

On Aug 26, 2015, at 5:17 PM, Sami Wagiaalla notifications@github.com
wrote:

In docs/proposals/pod-security-context.md
#12823 (comment):

+type ContainerSecurityContext struct {

  • Capabilities *Capabilities
  • Privileged *bool
  • SELinuxOptions *SELinuxOptions
  • RunAsUser *int64
  • RunAsNonRoot bool
  • RunAsGroup *int64
  • RunWithSupplementalGroups []int64
    +}
    +```

+TheContainerSecurityContexttype is very similar to the existingSecurityContexttype, with
+two additions:
+
+1. TheRunAsGroupfield specifies the GID the container process runs as
+2. TheRunWithSupplementalGroups field specifies additional groups the container process should

I don't know that supplemental groups need to be part of the API. If
RunAsGroup is specified in the pod's security context then the volumes can
be owned by that GID. If it is not specified then the kubelet should
generate and use supplemental group IDs in the background to solve the
volume permissions problem. This has the added benefit of being able to
solve the volume permissions problem without a need for API changes :)


Reply to this email directly or view it on GitHub
https://github.com/kubernetes/kubernetes/pull/12823/files#r38029439.

@thockin
Copy link
Member

thockin commented Aug 19, 2015

I did not think of ContainerDefaults at all the same as API defaults fro non-specified values. I saw PodSecuritySpec and ContainerSecuritySpec as a late-binding base and overlay. In that sense, any pod that does not have a ContainerDefaults field (i.e. everything today) will continue to operate as it does today. Adding a ContainerDefaults would change the behavior ONLY of things that were not overridden by the container.

Updating the ContainerDefaults on a running pod would probably result in all of the containers being restarted (that's easy, we maybe could be smarter).

@pmorie
Copy link
Member Author

pmorie commented Aug 19, 2015

I did not think of ContainerDefaults at all the same as API defaults fro non-specified values. I saw PodSecuritySpec and ContainerSecuritySpec as a late-binding base and overlay.

Me too.

@bgrant0607
Copy link
Member

The problem with late binding / overlay is that anything looking at container-level SecurityContext would be misled. Let's say, for example, someone developed an admission-control plugin to enforce security policy over SecurityContext. An overlay would be a good way to circumvent enforcement.

@thockin
Copy link
Member

thockin commented Aug 19, 2015

Admission control would have to look at the final rendering of the
structure with the overlays. Or maybe I missed the point...

On Wed, Aug 19, 2015 at 3:46 PM, Brian Grant notifications@github.com
wrote:

The problem with late binding / overlay is that anything looking at
container-level SecurityContext would be misled. Let's say, for example,
someone developed an admission-control plugin to enforce security policy
over SecurityContext. An overlay would be a good way to circumvent
enforcement.


Reply to this email directly or view it on GitHub
#12823 (comment)
.

@bgrant0607
Copy link
Member

The point is backwards compatibility with the existing v1 API -- code that's unaware of PodSecurityContext.

@thockin
Copy link
Member

thockin commented Aug 20, 2015

ahh, right. Good catch. yeah, that's harder then

On Wed, Aug 19, 2015 at 4:37 PM, Brian Grant notifications@github.com
wrote:

The point is backwards compatibility with the existing v1 API -- code
that's unaware of PodSecurityContext.


Reply to this email directly or view it on GitHub
#12823 (comment)
.

@pmorie
Copy link
Member Author

pmorie commented Aug 20, 2015

@bgrant0607 @thockin At what point are we planning to introduce v2beta1?

@pmorie
Copy link
Member Author

pmorie commented Aug 20, 2015

@bgrant0607 @thockin @smarterclayton

In the context of a new api version, I feel like this gets easier to support. A v1 podspec would be equivalent to a pod without a pod level security context in the internal API.

I do think we could accomplish our goals (which for me personally are around sharing volumes between multiple containers in a pod which may be running with different uid/gids) by omitting container defaults (maybe) and just introducing the pod-level security context for the pod-level options (host network ns, host ipc ns, pod-wide supplemental group). I think I could live with that for now (assuming it doesn't appear to be a horrible option after more thought) if the backwards compatibility issues within the v1 api are seriously problematic.

Let me think through fully what we would have to do on backwards compat and see what shakes out of that.

@smarterclayton
Copy link
Contributor

I kind of like just saying we have this new thing that overrides the old thing as a start. Tim convinced me that most of the time the new thing is what people want. In the future, the old thing (per container) can mutate to take into account new thing

@pmorie
Copy link
Member Author

pmorie commented Sep 24, 2015

@bgrant0607 Think you will have a chance to review today?

@pmorie
Copy link
Member Author

pmorie commented Sep 24, 2015

@thockin Brian told me you were the one to bug about this :-D

1. Attributes that apply to the pod itself
2. Attributes that apply to the containers of the pod

In the internal API, fields of the `PodSpec` controlling the use of the host PID, IPC, and network
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might emphasize this is internal API only and why (compat)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see this below, so nm

@thockin
Copy link
Member

thockin commented Sep 25, 2015

LGTM - most of my comments are for when the proposal turns into code. Just nits.

My only gripe with this doc is that it is very verbose in explaining what amounts to a relatively small change (after all the compat kerfuffle)

@thockin thockin added lgtm "Looks good to me", indicates that a PR is ready to be merged. e2e-not-required labels Sep 25, 2015
@bgrant0607 bgrant0607 removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 25, 2015
@bgrant0607
Copy link
Member

Please remove WIP from commit and PR titles.

@thockin thockin changed the title WIP: pod-level security context proposal Pod-level security context proposal Sep 25, 2015
@thockin thockin added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 25, 2015
@pmorie
Copy link
Member Author

pmorie commented Sep 25, 2015

My only gripe with this doc is that it is very verbose in explaining what amounts to a relatively small change (after all the compat kerfuffle)

+1

@pmorie
Copy link
Member Author

pmorie commented Sep 25, 2015

Commit message changed. Thanks a lot @thockin @bgrant0607 @smarterclayton

@thockin thockin added lgtm "Looks good to me", indicates that a PR is ready to be merged. and removed lgtm "Looks good to me", indicates that a PR is ready to be merged. labels Sep 25, 2015
@k8s-bot
Copy link

k8s-bot commented Sep 25, 2015

GCE e2e build/test passed for commit 95cd1ff70f8e5dfac3658b4c435017de7d2e6ce3.

@pmorie
Copy link
Member Author

pmorie commented Sep 25, 2015

@k8s-oncall PTAL

@bgrant0607
Copy link
Member

Please rebase to get shippable.yml changes.

@pmorie
Copy link
Member Author

pmorie commented Sep 25, 2015

@bgrant0607 Rebased

@k8s-bot
Copy link

k8s-bot commented Sep 25, 2015

Unit, integration and GCE e2e test build/test passed for commit 0c70062.

@pmorie
Copy link
Member Author

pmorie commented Sep 25, 2015

Shippable was a flake (#14577)

bgrant0607 added a commit that referenced this pull request Sep 25, 2015
Pod-level security context proposal
@bgrant0607 bgrant0607 merged commit cf75b0d into kubernetes:master Sep 25, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/api Indicates an issue on api area. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.