Security contexts and volumes #7925

pmorie · 2015-05-07T22:19:04Z

Now that #7343 is in, we can begin discussing the relationship between SecurityContexts and Volumes. There are a couple different facets:

Volumes should be validated against their assigned security contexts. For example, it should be a validation error if a pod which is not allowed to access the host's filesystem contains a hostDir volume.
Files in volumes should have the SELinux context set based on the security context of the pod. For example, if you create a pod with a secret volume, and the pod's security context contains an SELinux context, then the tmpfs mount for the secret volume's rootcontext should come from the pod's SELinux context.
Files in volumes should belong to the uid in their security context.

@thockin @erictune @pweil- @smarterclayton @bgrant0607

The text was updated successfully, but these errors were encountered:

eparis · 2015-05-07T22:31:29Z

@eparis cc

erictune · 2015-05-07T22:42:55Z

Regarding item 1: As I have been thinking about it, SecurityContext is about how the container should run, not how it could run. So, strictly speaking, the Volume is not validated against the SecurityContext. Rather, both the Volumes and the SecurityContext are each validated against something. We need to decide what that thing is. In the earlier SecurityContext proposal (#6287), that thing was a SecurityConstraint. Another alternative, which I'd like to explore, is to express allowed SecurityContext and Volume settings using Policy.

erictune · 2015-05-07T22:53:39Z

I'm not sure I understand what is being suggested in items 2 and 3.
I had been assuming that kubernetes volumes work like unix fs mounts.

Pre-existing files, by default, have whatever permissions/owner/group/attributes they have on their underlying storage. There may be mechanisms to remap/override, but use of them should be the exception, not the rule.

New files inherit attributes from their creating process (SecurityContext influences this).

pmorie · 2015-05-08T03:34:34Z

@erictune

Pre-existing files, by default, have whatever permissions/owner/group/attributes they have on their underlying storage. There may be mechanisms to remap/override, but use of them should be the exception, not the rule.

New files inherit attributes from their creating process (SecurityContext influences this).

I think we're on the same page. I had actually typed out 'generated' volumes in a draft of the issue but that made it out in an edit. I meant my comments to apply to volumes with generated content, secrets for example. @sdminonne's downward API volume plugin would be another example.

Another alternative, which I'd like to explore, is to express allowed SecurityContext and Volume settings using Policy.

@pweil- and I spent some time talking about exactly this today. The constraint stuff in #7893 relates to this -- for example, one piece of policy is whether the hostDir volume plugin is allowed.

soltysh · 2015-05-08T12:06:40Z

I'll link #4789 as this is related, per comments from #7908.

eparis · 2015-05-08T15:15:54Z

Random thoughts by eparis....

docker has z/Z options to the -v argument. This means that docker will actually SELinux relabel all of the files on a volume to make them accessable. Think of this as docker doing the SELinux equivalent of chown -R $USER:$GROUP $MNT.

And something that we MAY wish to support. But a security policy would want to be enforced, somehow. Some sort of check between a label on the pod and the volume?

Long term, if user namespaces are ever usable, I'd think we might want a similar piece of functionality for UIDs. If a pod is being run with a randomly generated UID from the host PoV you may need a way to chown the contents of a mount before it is used...

eparis · 2015-05-08T15:23:56Z

I have to agree with @erictune. We NEED a clean separation of policy and enforcement. I come from SELinux where we had such a thing and it was hard enough for people to figure out how to set their security policies correctly. But scattering policy like statements across many areas is even harder for users.

smarterclayton · 2015-05-08T15:29:13Z

----- Original Message -----

Regarding item 1: As I have been thinking about it, SecurityContext is about
how the container should run, not how it could run. So, strictly
speaking, the Volume is not validated against the SecurityContext. Rather,
both the Volumes and the SecurityContext are each validated against
something. We need to decide what that thing is. In the earlier
SecurityContext proposal (#6287), that thing was a SecurityConstraint.
Another alternative, which I'd like to explore, is to express allowed
SecurityContext and Volume settings using Policy.

Whether we call it SecurityConstraint, SecurityContextConstraint, SecurityContextPolicy, or a subset of policy, it should be clearly defined and expressible in terms of maximizing understanding of the admin to what it means. Part of my concern with generic policy is that it does not preserve that guarantee.

I'm fine with SecurityContextPolicy being treated as an opaque interface that is applied, and then one possible implementation is a SecurityContextConstraints object loaded from config or associated with a service account.

Our short term requirement is there be someway in the kubelet and API server to validate a security context + volumes is safe. If that's just an interface and we talk more about what the object looks like stored in the model that's fine (OpenShift can impose that via admission control in a custom way).

pweil- · 2015-05-08T18:02:24Z

What I've been batting around is 51f4ea3 which defines a set of constraints (or policy, if you'd like) that are used for the validation/enforcement.

My expectation is, like @erictune says, that a SecurityContext is the end result of the enforcement and becomes what the container runs with. The definition of the constraints is separate and lives at both a cluster level or the service account level as an override.

// SecurityContextConstraintsProvider is responsible for ensuring that every service account has a
// security constraints in place and that a pod's context adheres to the active constraints.
type SecurityContextConstraintsProvider interface {
    // CreateContextForPod creates a security context for the pod based on what was
    // requested and what the policy allows
    CreateContextForContainer(pod *api.Pod, container *api.Container) *api.SecurityContext
    // ValidateAgainstConstraints validates the pod against SecurityContextConstraints
    ValidateAgainstConstraints(pod *api.Pod) fielderrors.ValidationErrorList
}

The actual implementation of this provider can be given the cluster defaults and when either of those two methods are executed it first retrieves the SecurityContextConstraint from the ServiceAccount if it exists (using pod.spec.serviceaccount) or it uses the cluster defaults.

An admission controller could examine an incoming pod to see if SecurityContext is defined. If it is then it can use the ValidateAgainstConstraints method to accept or reject it. If there is no SecurityContext on the pod it can use CreateContextForContainer to create a constraint compliant context (say that five times fast).

Since the types for a SecurityContext are defined I don't think it's a stretch to base a constraints object on that definition, but allowing extension points for things like 'must run as uid 2' or 'must run as a user in range 1000-2000' rather than an explicit RunAsUser=5. The security context is rigid while the constraints are fluid.

Feedback is more than welcome!

pmorie · 2015-06-03T16:52:00Z

Related: #2630

pmorie · 2015-07-28T16:51:18Z

Some thoughts about generalizing SELinux context determination:

Currently there is a VolumeOptions struct which has a field, RootContext which was originally added to pass the SELinux context of the kubelet root dir to the volumes so that they could have this information to add simple SELinux support; I propose that this field be renamed to SELinuxContext, and that it should carry the effective SELinux context that a container will run as to the volume plugins
Volume plugins will use this field to perform mount operations with the right arguments
If necessary, plugins can also use this field to chcon mountpoints or directories of a pod's volumes (example: emptyDir will need to do this for default storage medium)
We should alter the kubelet to pass the :Z option to docker for the bind-mounts, which will make the docker daemon relabel bind-mounts and volumes with the container's effective SELinux context

So, ultimately, I see:

The kubelet being responsible for determining the effective SELinux context of a container (and therefore its volumes)
The volume plugins being responsible for carrying out the right actions to make their volumes usable with the supplied SELinux contexts
Docker is ultimately responsible for relabeling all volumes into the container's effective SELinux context

One thing we need to figure out about all this is how we should validate that all containers in a pod have interoperable SELinux contexts from the volume-sharing standpoint.

@thockin @eparis @timothysc

smarterclayton · 2015-07-28T17:37:36Z

SELinux isn't the only story though for volumes - we probably need to make
sure that it's just one possible volume option. Agree with your 1-3 as
actions.

Ultimately, our original goal was to ensure that the process running in the
container can write to the volume that is mounted into it. A use case
where the volume can't be written to is probably not our core objective.

We have two options - force the user to also specify a label for the
volume, or infer it. If we inferred it, we'd have to validate when it was
inconsistent or have a deterministic behavior. If we inferred it, users
would have to do nothing else to leverage selinux labels.

I know Tim was concerned with having multiple security settings - but if it
is valid to set a container to a label, having to ALSO set a volume option
sucks.

On Tue, Jul 28, 2015 at 12:51 PM, Paul Morie notifications@github.com
wrote:

Some thoughts about generalizing SELinux context determination:

Currently there is a VolumeOptions struct which has a field,
RootContext which was originally added to pass the SELinux context of
the kubelet root dir to the volumes so that they could have this
information to add simple SELinux support; I propose that this field be
renamed to SELinuxContext, and that it should carry the effective
SELinux context that a container will run as to the volume plugins

Volume plugins will use this field to perform mount operations with
the right arguments

If necessary, plugins can also use this field to chcon mountpoints
or directories of a pod's volumes (example: emptyDir will need to do this
for default storage medium)

We should alter the kubelet to pass the :Z option to docker for the
bind-mounts, which will make the docker daemon relabel bind-mounts and
volumes with the container's effective SELinux context

So, ultimately, I see:

The kubelet being responsible for determining the effective SELinux
context of a container (and therefore its volumes)

The volume plugins being responsible for carrying out the right
actions to make their volumes usable with the supplied SELinux contexts

Docker is ultimately responsible for relabeling all volumes into
the container's effective SELinux context

One thing we need to figure out about all this is how we should validate
that all containers in a pod have interoperable SELinux contexts from the
volume-sharing standpoint.

@thockin https://github.com/thockin @eparis https://github.com/eparis
@timothysc https://github.com/timothysc

—
Reply to this email directly or view it on GitHub
#7925 (comment)
.

Clayton Coleman | Lead Engineer, OpenShift

timothysc · 2015-07-28T20:08:02Z

/cc @rootfs

pmorie · 2015-07-29T01:19:48Z

@smarterclayton Mostly agree with what you said. I think before I code anything I want to lay out how I see SELinux context, uid, gid being handled across all volume types; I mostly see the matrix for that in my head -- will type it out once I'm sure it makes sense.

thockin · 2015-07-29T17:18:46Z

I continue to be out of my depth on security issues, but this seems to be pointing in the right direction. What I really can't tolerate is sprinkling a little selinux over here, and some contexts over there, to the result that the codebase is littered with bits of detritus that nobody can comprehend in toto.

The biggest issue for me is that we attach SecurityContext to Containers, but Volumes span containers. The traditional UNIX model of group permissions is very easy to comprehend, I'd really love to see something about that complicated. Honestly, I'd probably rather see us make Volumes work properly at the expense of per-container UIDs, if push came to shove. I hope it doesn't, of course.

thockin · 2015-07-29T17:21:59Z

The volume plugins being responsible for carrying out the right actions to make their volumes usable with the supplied SELinux contexts

Make this easy or nobody will get it right.

How does this whole mess degrade when NOT using selinux?

pmorie · 2015-07-29T17:35:51Z

@thockin When not using SELinux, the whole determination of the SELinux context should be skipped and no actions should be taken in the form of:

chcon on the volume (or pass of :Z to docker)
context or rootcontext arguments to mount

thockin · 2015-07-29T17:43:18Z

Please don't push the "is selinux enabled" into Volume plugins - it will
never converge :)

On Wed, Jul 29, 2015 at 10:36 AM, Paul Morie notifications@github.com
wrote:

@thockin https://github.com/thockin When not using SELinux, the whole
determination of the SELinux context should be skipped and no actions
should be taken in the form of:

chcon on the volume (or pass of :Z to docker)

context or rootcontext arguments to mount

—
Reply to this email directly or view it on GitHub
#7925 (comment)
.

pmorie · 2015-07-29T17:44:20Z

@thockin Agree -- there will be some conditionals around 'do i need to add this argument to mount', but I don't think that's avoidable.

smarterclayton · 2015-07-29T20:55:33Z

If we can get groups to work I think a pod should have a group, done, end
of story. Then every volume would have a group. I feel a little weird
about saying that all containers in a pod have the same security context,
but I understand why you have the desire to simplify to that. I can think
of lots of scenarios with user namespaces where I'd want container A to
have uid 1 and pod B to have uid 2 - but they are mostly "adaption" use
case (i want to take something that already works and make it work on Kube).

Hypothetically:

Pod
Spec
SecurityContext
Everything including UID
Containers
1: SecurityContext
UID

Everything except UID goes to pod level security context. One user
namespace, one SELinux label, one group, per pod. All volumes share group
and labels and the uid of the pod.

On Wed, Jul 29, 2015 at 1:19 PM, Tim Hockin notifications@github.com
wrote:

I continue to be out of my depth on security issues, but this seems to be
pointing in the right direction. What I really can't tolerate is sprinkling
a little selinux over here, and some contexts over there, to the result
that the codebase is littered with bits of detritus that nobody can
comprehend in toto.

The biggest issue for me is that we attach SecurityContext to Containers,
but Volumes span containers. The traditional UNIX model of group
permissions is very easy to comprehend, I'd really love to see something
about that complicated. Honestly, I'd probably rather see us make Volumes
work properly at the expense of per-container UIDs, if push came to shove.
I hope it doesn't, of course.

—
Reply to this email directly or view it on GitHub
#7925 (comment)
.

Clayton Coleman | Lead Engineer, OpenShift

pmorie · 2015-09-28T17:49:09Z

Related proposals:

#12823
#12944

pmorie · 2016-04-25T01:15:43Z

This work is competed, see above proposals.

erictune mentioned this issue May 7, 2015

Added SecretFileMode for specifying mounted secrets file permissions. #7908

Closed

nikhiljindal added area/security sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. priority/design labels May 11, 2015

thockin added kind/design Categorizes issue or PR as related to design. and removed kind/design Categorizes issue or PR as related to design. priority/design labels May 19, 2015

saad-ali added the priority/backlog Higher priority than priority/awaiting-more-evidence. label May 25, 2015

erictune mentioned this issue May 29, 2015

WIP: Security Policy #7893

Merged

pmorie mentioned this issue Jun 16, 2015

WIP: Emptydir and security context #9844

Closed

pmorie mentioned this issue Jul 28, 2015

Generalize SELinux context determination and volume support #11917

Closed

thockin removed the priority/backlog Higher priority than priority/awaiting-more-evidence. label Jul 29, 2015

roberthbailey added the team/cluster label Aug 27, 2015

pmorie closed this as completed Apr 25, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Security contexts and volumes #7925

Security contexts and volumes #7925

pmorie commented May 7, 2015

eparis commented May 7, 2015

erictune commented May 7, 2015

erictune commented May 7, 2015

pmorie commented May 8, 2015

soltysh commented May 8, 2015

eparis commented May 8, 2015

eparis commented May 8, 2015

smarterclayton commented May 8, 2015

pweil- commented May 8, 2015

pmorie commented Jun 3, 2015

pmorie commented Jul 28, 2015

smarterclayton commented Jul 28, 2015

timothysc commented Jul 28, 2015

pmorie commented Jul 29, 2015

thockin commented Jul 29, 2015

thockin commented Jul 29, 2015

pmorie commented Jul 29, 2015

thockin commented Jul 29, 2015

pmorie commented Jul 29, 2015

smarterclayton commented Jul 29, 2015

pmorie commented Sep 28, 2015

pmorie commented Apr 25, 2016

Security contexts and volumes #7925

Security contexts and volumes #7925

Comments

pmorie commented May 7, 2015

eparis commented May 7, 2015

erictune commented May 7, 2015

erictune commented May 7, 2015

pmorie commented May 8, 2015

soltysh commented May 8, 2015

eparis commented May 8, 2015

eparis commented May 8, 2015

smarterclayton commented May 8, 2015

pweil- commented May 8, 2015

pmorie commented Jun 3, 2015

pmorie commented Jul 28, 2015

smarterclayton commented Jul 28, 2015

timothysc commented Jul 28, 2015

pmorie commented Jul 29, 2015

thockin commented Jul 29, 2015

thockin commented Jul 29, 2015

pmorie commented Jul 29, 2015

thockin commented Jul 29, 2015

pmorie commented Jul 29, 2015

smarterclayton commented Jul 29, 2015

pmorie commented Sep 28, 2015

pmorie commented Apr 25, 2016