Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Security contexts and volumes #7925

Closed
pmorie opened this issue May 7, 2015 · 22 comments
Closed

Security contexts and volumes #7925

pmorie opened this issue May 7, 2015 · 22 comments
Labels
area/security kind/design Categorizes issue or PR as related to design. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle.

Comments

@pmorie
Copy link
Member

pmorie commented May 7, 2015

Now that #7343 is in, we can begin discussing the relationship between SecurityContexts and Volumes. There are a couple different facets:

  1. Volumes should be validated against their assigned security contexts. For example, it should be a validation error if a pod which is not allowed to access the host's filesystem contains a hostDir volume.
  2. Files in volumes should have the SELinux context set based on the security context of the pod. For example, if you create a pod with a secret volume, and the pod's security context contains an SELinux context, then the tmpfs mount for the secret volume's rootcontext should come from the pod's SELinux context.
  3. Files in volumes should belong to the uid in their security context.

@thockin @erictune @pweil- @smarterclayton @bgrant0607

@eparis
Copy link
Contributor

eparis commented May 7, 2015

@eparis cc

@erictune
Copy link
Member

erictune commented May 7, 2015

Regarding item 1: As I have been thinking about it, SecurityContext is about how the container should run, not how it could run. So, strictly speaking, the Volume is not validated against the SecurityContext. Rather, both the Volumes and the SecurityContext are each validated against something. We need to decide what that thing is. In the earlier SecurityContext proposal (#6287), that thing was a SecurityConstraint. Another alternative, which I'd like to explore, is to express allowed SecurityContext and Volume settings using Policy.

@erictune
Copy link
Member

erictune commented May 7, 2015

I'm not sure I understand what is being suggested in items 2 and 3.
I had been assuming that kubernetes volumes work like unix fs mounts.

Pre-existing files, by default, have whatever permissions/owner/group/attributes they have on their underlying storage. There may be mechanisms to remap/override, but use of them should be the exception, not the rule.

New files inherit attributes from their creating process (SecurityContext influences this).

@pmorie
Copy link
Member Author

pmorie commented May 8, 2015

@erictune

Pre-existing files, by default, have whatever permissions/owner/group/attributes they have on their underlying storage. There may be mechanisms to remap/override, but use of them should be the exception, not the rule.

New files inherit attributes from their creating process (SecurityContext influences this).

I think we're on the same page. I had actually typed out 'generated' volumes in a draft of the issue but that made it out in an edit. I meant my comments to apply to volumes with generated content, secrets for example. @sdminonne's downward API volume plugin would be another example.

Another alternative, which I'd like to explore, is to express allowed SecurityContext and Volume settings using Policy.

@pweil- and I spent some time talking about exactly this today. The constraint stuff in #7893 relates to this -- for example, one piece of policy is whether the hostDir volume plugin is allowed.

@soltysh
Copy link
Contributor

soltysh commented May 8, 2015

I'll link #4789 as this is related, per comments from #7908.

@eparis
Copy link
Contributor

eparis commented May 8, 2015

Random thoughts by eparis....

docker has z/Z options to the -v argument. This means that docker will actually SELinux relabel all of the files on a volume to make them accessable. Think of this as docker doing the SELinux equivalent of chown -R $USER:$GROUP $MNT.

And something that we MAY wish to support. But a security policy would want to be enforced, somehow. Some sort of check between a label on the pod and the volume?

Long term, if user namespaces are ever usable, I'd think we might want a similar piece of functionality for UIDs. If a pod is being run with a randomly generated UID from the host PoV you may need a way to chown the contents of a mount before it is used...

@eparis
Copy link
Contributor

eparis commented May 8, 2015

I have to agree with @erictune. We NEED a clean separation of policy and enforcement. I come from SELinux where we had such a thing and it was hard enough for people to figure out how to set their security policies correctly. But scattering policy like statements across many areas is even harder for users.

@smarterclayton
Copy link
Contributor

----- Original Message -----

Regarding item 1: As I have been thinking about it, SecurityContext is about
how the container should run, not how it could run. So, strictly
speaking, the Volume is not validated against the SecurityContext. Rather,
both the Volumes and the SecurityContext are each validated against
something. We need to decide what that thing is. In the earlier
SecurityContext proposal (#6287), that thing was a SecurityConstraint.
Another alternative, which I'd like to explore, is to express allowed
SecurityContext and Volume settings using Policy.

Whether we call it SecurityConstraint, SecurityContextConstraint, SecurityContextPolicy, or a subset of policy, it should be clearly defined and expressible in terms of maximizing understanding of the admin to what it means. Part of my concern with generic policy is that it does not preserve that guarantee.

I'm fine with SecurityContextPolicy being treated as an opaque interface that is applied, and then one possible implementation is a SecurityContextConstraints object loaded from config or associated with a service account.

Our short term requirement is there be someway in the kubelet and API server to validate a security context + volumes is safe. If that's just an interface and we talk more about what the object looks like stored in the model that's fine (OpenShift can impose that via admission control in a custom way).

@pweil-
Copy link
Contributor

pweil- commented May 8, 2015

What I've been batting around is 51f4ea3 which defines a set of constraints (or policy, if you'd like) that are used for the validation/enforcement.

My expectation is, like @erictune says, that a SecurityContext is the end result of the enforcement and becomes what the container runs with. The definition of the constraints is separate and lives at both a cluster level or the service account level as an override.

// SecurityContextConstraintsProvider is responsible for ensuring that every service account has a
// security constraints in place and that a pod's context adheres to the active constraints.
type SecurityContextConstraintsProvider interface {
    // CreateContextForPod creates a security context for the pod based on what was
    // requested and what the policy allows
    CreateContextForContainer(pod *api.Pod, container *api.Container) *api.SecurityContext
    // ValidateAgainstConstraints validates the pod against SecurityContextConstraints
    ValidateAgainstConstraints(pod *api.Pod) fielderrors.ValidationErrorList
}

The actual implementation of this provider can be given the cluster defaults and when either of those two methods are executed it first retrieves the SecurityContextConstraint from the ServiceAccount if it exists (using pod.spec.serviceaccount) or it uses the cluster defaults.

An admission controller could examine an incoming pod to see if SecurityContext is defined. If it is then it can use the ValidateAgainstConstraints method to accept or reject it. If there is no SecurityContext on the pod it can use CreateContextForContainer to create a constraint compliant context (say that five times fast).

Since the types for a SecurityContext are defined I don't think it's a stretch to base a constraints object on that definition, but allowing extension points for things like 'must run as uid 2' or 'must run as a user in range 1000-2000' rather than an explicit RunAsUser=5. The security context is rigid while the constraints are fluid.

Feedback is more than welcome!

@nikhiljindal nikhiljindal added area/security sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. priority/design labels May 11, 2015
@thockin thockin added kind/design Categorizes issue or PR as related to design. and removed kind/design Categorizes issue or PR as related to design. priority/design labels May 19, 2015
@saad-ali saad-ali added the priority/backlog Higher priority than priority/awaiting-more-evidence. label May 25, 2015
@pmorie
Copy link
Member Author

pmorie commented Jun 3, 2015

Related: #2630

@pmorie
Copy link
Member Author

pmorie commented Jul 28, 2015

Some thoughts about generalizing SELinux context determination:

  1. Currently there is a VolumeOptions struct which has a field, RootContext which was originally added to pass the SELinux context of the kubelet root dir to the volumes so that they could have this information to add simple SELinux support; I propose that this field be renamed to SELinuxContext, and that it should carry the effective SELinux context that a container will run as to the volume plugins
  2. Volume plugins will use this field to perform mount operations with the right arguments
  3. If necessary, plugins can also use this field to chcon mountpoints or directories of a pod's volumes (example: emptyDir will need to do this for default storage medium)
  4. We should alter the kubelet to pass the :Z option to docker for the bind-mounts, which will make the docker daemon relabel bind-mounts and volumes with the container's effective SELinux context

So, ultimately, I see:

  1. The kubelet being responsible for determining the effective SELinux context of a container (and therefore its volumes)
  2. The volume plugins being responsible for carrying out the right actions to make their volumes usable with the supplied SELinux contexts
  3. Docker is ultimately responsible for relabeling all volumes into the container's effective SELinux context

One thing we need to figure out about all this is how we should validate that all containers in a pod have interoperable SELinux contexts from the volume-sharing standpoint.

@thockin @eparis @timothysc

@smarterclayton
Copy link
Contributor

SELinux isn't the only story though for volumes - we probably need to make
sure that it's just one possible volume option. Agree with your 1-3 as
actions.

Ultimately, our original goal was to ensure that the process running in the
container can write to the volume that is mounted into it. A use case
where the volume can't be written to is probably not our core objective.

We have two options - force the user to also specify a label for the
volume, or infer it. If we inferred it, we'd have to validate when it was
inconsistent or have a deterministic behavior. If we inferred it, users
would have to do nothing else to leverage selinux labels.

I know Tim was concerned with having multiple security settings - but if it
is valid to set a container to a label, having to ALSO set a volume option
sucks.

On Tue, Jul 28, 2015 at 12:51 PM, Paul Morie notifications@github.com
wrote:

Some thoughts about generalizing SELinux context determination:

  1. Currently there is a VolumeOptions struct which has a field,
    RootContext which was originally added to pass the SELinux context of
    the kubelet root dir to the volumes so that they could have this
    information to add simple SELinux support; I propose that this field be
    renamed to SELinuxContext, and that it should carry the effective
    SELinux context that a container will run as to the volume plugins
  2. Volume plugins will use this field to perform mount operations with
    the right arguments
  3. If necessary, plugins can also use this field to chcon mountpoints
    or directories of a pod's volumes (example: emptyDir will need to do this
    for default storage medium)
  4. We should alter the kubelet to pass the :Z option to docker for the
    bind-mounts, which will make the docker daemon relabel bind-mounts and
    volumes with the container's effective SELinux context

So, ultimately, I see:

  1. The kubelet being responsible for determining the effective SELinux
    context of a container (and therefore its volumes)
  2. The volume plugins being responsible for carrying out the right
    actions to make their volumes usable with the supplied SELinux contexts
  3. Docker is ultimately responsible for relabeling all volumes into
    the container's effective SELinux context

One thing we need to figure out about all this is how we should validate
that all containers in a pod have interoperable SELinux contexts from the
volume-sharing standpoint.

@thockin https://github.com/thockin @eparis https://github.com/eparis
@timothysc https://github.com/timothysc


Reply to this email directly or view it on GitHub
#7925 (comment)
.

Clayton Coleman | Lead Engineer, OpenShift

@timothysc
Copy link
Member

/cc @rootfs

@pmorie
Copy link
Member Author

pmorie commented Jul 29, 2015

@smarterclayton Mostly agree with what you said. I think before I code anything I want to lay out how I see SELinux context, uid, gid being handled across all volume types; I mostly see the matrix for that in my head -- will type it out once I'm sure it makes sense.

@thockin thockin removed the priority/backlog Higher priority than priority/awaiting-more-evidence. label Jul 29, 2015
@thockin
Copy link
Member

thockin commented Jul 29, 2015

I continue to be out of my depth on security issues, but this seems to be pointing in the right direction. What I really can't tolerate is sprinkling a little selinux over here, and some contexts over there, to the result that the codebase is littered with bits of detritus that nobody can comprehend in toto.

The biggest issue for me is that we attach SecurityContext to Containers, but Volumes span containers. The traditional UNIX model of group permissions is very easy to comprehend, I'd really love to see something about that complicated. Honestly, I'd probably rather see us make Volumes work properly at the expense of per-container UIDs, if push came to shove. I hope it doesn't, of course.

@thockin
Copy link
Member

thockin commented Jul 29, 2015

  1. The volume plugins being responsible for carrying out the right actions to make their volumes usable with the supplied SELinux contexts

Make this easy or nobody will get it right.

How does this whole mess degrade when NOT using selinux?

@pmorie
Copy link
Member Author

pmorie commented Jul 29, 2015

@thockin When not using SELinux, the whole determination of the SELinux context should be skipped and no actions should be taken in the form of:

  1. chcon on the volume (or pass of :Z to docker)
  2. context or rootcontext arguments to mount

@thockin
Copy link
Member

thockin commented Jul 29, 2015

Please don't push the "is selinux enabled" into Volume plugins - it will
never converge :)

On Wed, Jul 29, 2015 at 10:36 AM, Paul Morie notifications@github.com
wrote:

@thockin https://github.com/thockin When not using SELinux, the whole
determination of the SELinux context should be skipped and no actions
should be taken in the form of:

  1. chcon on the volume (or pass of :Z to docker)
  2. context or rootcontext arguments to mount


Reply to this email directly or view it on GitHub
#7925 (comment)
.

@pmorie
Copy link
Member Author

pmorie commented Jul 29, 2015

@thockin Agree -- there will be some conditionals around 'do i need to add this argument to mount', but I don't think that's avoidable.

@smarterclayton
Copy link
Contributor

If we can get groups to work I think a pod should have a group, done, end
of story. Then every volume would have a group. I feel a little weird
about saying that all containers in a pod have the same security context,
but I understand why you have the desire to simplify to that. I can think
of lots of scenarios with user namespaces where I'd want container A to
have uid 1 and pod B to have uid 2 - but they are mostly "adaption" use
case (i want to take something that already works and make it work on Kube).

Hypothetically:

Pod
Spec
SecurityContext
Everything including UID
Containers
1: SecurityContext
UID

Everything except UID goes to pod level security context. One user
namespace, one SELinux label, one group, per pod. All volumes share group
and labels and the uid of the pod.

On Wed, Jul 29, 2015 at 1:19 PM, Tim Hockin notifications@github.com
wrote:

I continue to be out of my depth on security issues, but this seems to be
pointing in the right direction. What I really can't tolerate is sprinkling
a little selinux over here, and some contexts over there, to the result
that the codebase is littered with bits of detritus that nobody can
comprehend in toto.

The biggest issue for me is that we attach SecurityContext to Containers,
but Volumes span containers. The traditional UNIX model of group
permissions is very easy to comprehend, I'd really love to see something
about that complicated. Honestly, I'd probably rather see us make Volumes
work properly at the expense of per-container UIDs, if push came to shove.
I hope it doesn't, of course.


Reply to this email directly or view it on GitHub
#7925 (comment)
.

Clayton Coleman | Lead Engineer, OpenShift

@pmorie
Copy link
Member Author

pmorie commented Sep 28, 2015

Related proposals:

#12823
#12944

@pmorie
Copy link
Member Author

pmorie commented Apr 25, 2016

This work is competed, see above proposals.

@pmorie pmorie closed this as completed Apr 25, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/security kind/design Categorizes issue or PR as related to design. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle.
Projects
None yet
Development

No branches or pull requests