Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Policy unions (vault#3892) #514

Open
achetronic opened this issue Sep 7, 2024 · 26 comments
Open

Policy unions (vault#3892) #514

achetronic opened this issue Sep 7, 2024 · 26 comments
Labels
core/auth Related to core improvements in authentication feature New feature or request help wanted Extra attention is needed needs rfc Needs an RFC/design document roadmap:safer Roadmap item; safer category

Comments

@achetronic
Copy link

Describe the bug
This is a descendant bug happening on H.Vault. When using OIDC and a user belongs to several groups, one with root/admin grants everywhere and another with less privileges, it ends up with read-only permissions in some places.

This is really well described by another person in this issue

Fixing this has been claimed by the community and ignored by Hashicorp and we are considering moving to Openbao when it's ready

What's your opinion on this regard?

To Reproduce

  1. Configure OIDC
  2. Configure a high privileges policy and link it to a group
  3. Idem previous but with less privileges
  4. Link a user to both groups

Expected behavior
Keep the highest privileges on policy mixing for a path

@achetronic achetronic added the bug Something isn't working label Sep 7, 2024
@cipherboy
Copy link
Member

I think it is a very useful change to have, but I think we'll need some careful RFC design work to ensure we're all happy with the results. Happy to work with you or anyone else to try and get this added though! :-)


At this point it would be a breaking change if we were to enforce it globally.... I'm kinda curious, though, how we'd end up flagging that? We do have a few sys/config/:subsystem endpoints, so we could perhaps add a sys/config/policies/acl endpoint to configure these... But I'm not sure if putting this in the config file would be more ideal as it seems like a rather privileged, static feature?

Alternatively, and this would be an absolute mess, we could have the two coexist, and mark policies with a join_rule or similar stanza that could default to priority and optionally take the string union to opt-in to the new behavior. But I don't really know how you'd handle cross-policy joining (priority-based to union-based &c)... I'd prefer to err in this case (special casing the default policy), but I'm not sure how we'd do that... This would be more ideal though, in that a deployment could gradually migrate from one policy enforcement system to the new one.

Some thoughts! Let me know if you're interested in tackling the RFC. I'm not sure if you've seen our roadmap proposal, but while not explicitly listed, I think this does fit in the spirit of some of the ACL related enhancements!

@cipherboy cipherboy added enhancement help wanted Extra attention is needed core/auth Related to core improvements in authentication labels Sep 7, 2024
@JanMa
Copy link
Member

JanMa commented Sep 7, 2024

Hey @achetronic thanks for opening this issue. I've faced this issue as well when using Vault/OpenBao but it usually only affected a small number of users which were given broad access to the cluster. As a workaround we told them to create a sub-token with just the relevant admin policy attached to it in case they needed to change/access something which they couldn't because a more specific policy denied it.

You can see an example on how to do this in our CLI docs: https://openbao.org/docs/commands/token/create/

Is this something that would work for you use-case as well?

@cipherboy
Copy link
Member

cipherboy commented Sep 7, 2024

@JanMa I think a subtoken would be possible, but a lot of existing integrations don't really support that flow, if they support authing directly. If the groups don't do what you need them to, you might end up with weird limitations and needing multiple subtokens (if there's no single broad "admin" groups but multiple smaller ones, perhaps with limited permissions on other paths). With LDAP auth (as mentioned in the original issue) or maybe also Kubernetes(?), groups/policies are externally indicated and might overlap as a result.

You'd have to specially modify the application to know exactly which polic(y/ies) that need to be accessed from the subtoken, or modify it to retry failed operations with a subtoken for each assigned policy. Seems a bit complex, if OpenBao can be modified to make union behavior the default. It also makes it substantially easier to reasons about than the current behavior.

My 2c.


Concretely, for a stupid example, if you had a policy for role administrators and issuer administrators:

Role:

path "pki/roles/+" {
    capabilities = ["create", "read", "update", "delete"]
}

path "pki/issuers/+" {
    capabilities = ["read"]
}

Issuer admin:

path "pki/roles/+" {
    capabilities = ["read"]
}

path "pki/issuers/*" {
    capabilities = ["create", "read", "update", "delete"]
}

I think you'd get conflicts and need specific operations to use specific tokens (you could only update roles under the first policy and perform certain operations on issuers under the second).

@achetronic
Copy link
Author

Hello again there @cipherboy @JanMa :)

IMHO this depends mostly on company policies, as depending on the organization, teams are totally separated or there are people owned by several teams

Due to this, I would say this is a feature that should be defined globally

Moreover, defining this globally affects only to the moment the policies are mixed to get a result, so this is a much more predictable behavior.

Something like defining the mixing_policy globally to affect the whole instance.

I'm thinking that having this defined globally can cause less breaking changes against the upstream, avoiding bad effects to the Vault ecosystem, right? affecting only the moment when the policies are calculated for a token, it's really hard to break things

What do you think about defining a small RFC? I mean, I have not done any before, but I can put my efforts on trying it

@cipherboy
Copy link
Member

Hmm, I don't like either mixing_policy nor join_rule as a name for the config option, but let's keep thinking about it.

Yes, a RFC would be the next step. Do you mind following our template? You can attach the format here I think, no need to open a new issue.

Thanks!

@achetronic
Copy link
Author

achetronic commented Sep 10, 2024

Hmm, I don't like either mixing_policy nor join_rule as a name for the config option, but let's keep thinking about it.

Yes, a RFC would be the next step. Do you mind following our template? You can attach the format here I think, no need to open a new issue.

Thanks!

Oh no, the horrible param name was not a proposal, but a way to focus on the idea for explanation. Sorry for the miss understood :)

I will read the template tomorrow and will try to fill it ASAP

CC: @sebastocorp

@cipherboy
Copy link
Member

@achetronic one thought: how will explicit DENY be handled? I think this should still be applied to the union and even if another policy granted it, we should still have DENY take precedence? Thoughts?

@JanMa
Copy link
Member

JanMa commented Sep 20, 2024

Regarding the design, I think adding priorities to policies might be a valid way to approach this too.
So If I have two conflicting policies, the one with the higher priority "wins". I imagine that implementation wise this is also much more backwards compatible, since we can just assign a default priority to policies and keep the old behavior if two policies have the same pritority 🤔

@achetronic
Copy link
Author

@cipherboy @JanMa Well, I have been thinking about this issue

Approach 1:
IMO, the cleanest approach is having priorities in the policies. That way it's delegated to the admins how to mix the policies. This is the approach I like the most

Approach 2:
Being additive is not against being least privileged.

As an example, if you have:

  • A super admin policy with privileges everywhere
  • Another smaller one, adding you some privileges somewhere
  • Some of the previous policies denying things

You can:

  1. Mix adding the sum of privileges for the 1st and 2nd in a first mixing stage
  2. Then, apply denials in the end as a last step.

So you are always additive unless some other policy deny some path


I prefer the first over the second approach as this seems to cover more use-cases in the future, as you delegate this behavior to the final user

WDYT about this?

@cipherboy
Copy link
Member

cipherboy commented Sep 23, 2024

@achetronic Approach 1 still requires strict superset permissioning for policies, right? In my example above, my policies did not have a superset relation, and some common paths had completely different/non-overlapping permissions.

This meant just choosing one to "win" (here, defining that to mean, get the capabilities of the highest priority policy), doesn't give the user their full potential of possible permissions. I think only approach 2 / unions would do so.

@achetronic
Copy link
Author

@achetronic Approach 1 still requires strict superset permissioning for policies, right? In my example above, my policies did not have a superset relation, and some common paths had completely different/non-overlapping permissions.

This meant just choosing one to "win" (here, defining that to mean, get the capabilities of the highest priority policy), doesn't give the user their full potential of possible permissions. I think only approach 2 / unions would do so.

Based on the following sample covering both :

# Highest priority wins

path "pki/roles/+" {
    capabilities = ["deny"]
    priority = 100
}

path "pki/roles/+" {
    capabilities = ["read"]
    priority = 200
}


path "pki/roles/+" {
    capabilities = ["create", "update", "delete"]
    priority = 300
}

-----------------------------------------------------

# Approach 1
union_mode: priority
Result: pki/roles/{something}: ["create", "update", "delete"]

# Approach 2
union_mode: additive_strict_deny
Result: pki/roles/{something}: ["create", "read", "update", "delete", "deny"] --------> ["deny"]

-----------------------------------------------------

Approach 1
Based on previous example, IMHO it can work too.
Even when there are some conflicting policies, it's on the admins' hands to decide what to apply to each group, right?
so if they ends up in a situation where less permissions are surviving, it's fine as it's an aware decision.

It's a matter of being well-organized when defining the priorities and the policies to know the behavior under conflicts

Approach 2
It works based on the being-additive concept, so on conflicting paths they are always adding all the present permissions, unless some path is explicitly denied. In that case it follows the least-permissions principle.
In case there is no deny, it's always additive, right?


IMO is possible that it's not a matter of deciding the approach, but of implementing both them to cover all use-cases? just thinking loud about your idea of having a parameter to configure the mixing_mode with potential modes like: legacy, priority, additive_strict_deny, etc

CC: @sebastocorp

@cipherboy
Copy link
Member

\o hey @achetronic

That approach works if there is such a role with all the relevant capabilities.

But supposed you had these policies:

# Policy A -- reader
path "pki/roles/+" {
    capabilities = ["read"]
}
# Policy B -- modifier
path "pki/roles/+" {
    capabilities = ["update", "delete"]
}

Then no assignment of priorities (per path or per policy) that would let you have all three capabilities (read, update, and delete). You'd have to have a third policy, which would often mean assigning a third group. That is my point: union policy is still fundamentally different than priorities, for some design of policies.

Do these ACL policies make sense? I don't know, maybe they don't! But they are a scenario where any non-union combining policy isn't sufficient, IMO.

@achetronic
Copy link
Author

mmmm if i'm not wrong, not having a priority defined or having the same priority, should behave the same. I mean, not having a priority could mean having the default priority, and this is the same as several conflicting policies having the same priority.

if several policies have the same priority, they could be mixed together, with all the capabilities, as they are important in the same way?

I mean, maybe it's possible to apply Approach2 on Approach1 priority-conflicts? what is your suggestion on this? do you have any idea? 🙂

@JanMa
Copy link
Member

JanMa commented Sep 23, 2024

@cipherboy the example you give aleady works with the current behavior? According to our docs:

If the same pattern appears in multiple policies, we take the union of the capabilities.

Look at the following example:

# reader policy
➜ cat reader.hcl
path "secret/data/foo/+" {
  capabilities = ["read"]
}

# writer policy
➜ cat writer.hcl 
path "secret/data/foo/+" {
  capabilities = ["update", "delete"]
}

# create policies
➜ bao policy write reader ./reader.hcl
Success! Uploaded policy: reader
➜ bao policy write writer ./writer.hcl
Success! Uploaded policy: writer

# create tokens with attached policiesexport READER_TOKEN=$(bao token create -policy=reader -format json | jq -r '.auth.client_token')export WRITER_TOKEN=$(bao token create -policy=writer -format json | jq -r '.auth.client_token')export RW_TOKEN=$(bao token create -policy=writer -policy=reader -format json | jq -r '.auth.client_token')                             

# read secret with reader token
➜ BAO_TOKEN=${READER_TOKEN} bao kv get secret/foo/bar      
=== Secret Path ===                                                            
secret/data/foo/bar                                                            
                                                                               
======= Metadata =======                                                       
Key                Value                                                       
---                -----                                                       
created_time       2024-09-23T19:48:43.716141611Z                              
custom_metadata    <nil>                                                       
deletion_time      n/a                                                         
destroyed          false                                                       
version            2                                                           
                                                                               
=== Data ===                                                                   
Key    Value                                                                   
---    -----                                                                   
foo    bar

# try to read secret with writer token
➜ BAO_TOKEN=${WRITER_TOKEN} bao kv get secret/foo/bar      
Error reading secret/data/foo/bar: Error making API request.                   
                                                                                                                                                              
URL: GET http://127.0.0.1:8200/v1/secret/data/foo/bar                          
Code: 403. Errors:                                                             

* 1 error occurred:                                                            
        * permission denied                                                    


# update secret with writer token
➜ BAO_TOKEN=${WRITER_TOKEN} bao kv put secret/foo/bar bar=baz                                                                             
=== Secret Path ===                                                            
secret/data/foo/bar                                                            

======= Metadata =======                                                       
Key                Value                                                       
---                -----                                                       
created_time       2024-09-23T19:54:56.260314701Z                              
custom_metadata    <nil>                                                       
deletion_time      n/a                                                         
destroyed          false                                                       
version            3

# read secret with read/write token
➜ BAO_TOKEN=${RW_TOKEN} bao kv get secret/foo/bar                                                                                         
=== Secret Path ===                                                            
secret/data/foo/bar                                                            

======= Metadata =======                                                       
Key                Value                                                       
---                -----                                                       
created_time       2024-09-23T19:54:56.260314701Z                              
custom_metadata    <nil>                                                       
deletion_time      n/a                                                         
destroyed          false                                                       
version            3                                                           

=== Data ===                                                                   
Key    Value                                                                   
---    -----                                                                   
bar    baz                                                                

# update secret with read/write token
➜ BAO_TOKEN=${RW_TOKEN} bao kv put secret/foo/bar foo=bar                                                                                 
=== Secret Path ===                                                            
secret/data/foo/bar                                                            

======= Metadata =======                                                       
Key                Value                                                       
---                -----                                                       
created_time       2024-09-23T19:55:28.169272755Z                              
custom_metadata    <nil>                                                       
deletion_time      n/a                                                         
destroyed          false                                                       
version            4                  

And if I'd want to replicate that with policy priorities, I can give both policies the same priority.

@cipherboy
Copy link
Member

cipherboy commented Sep 23, 2024

@JanMa said:

And if I'd want to replicate that with policy priorities, I can give both policies the same priority.

Ah, you're exactly right. We'd need a more complex path example like:

# Policy A -- reader
path "pki/roles/*" {
    capabilities = ["read"]
}
# Policy B -- modifier
path "pki/roles/+" {
    capabilities = ["update", "delete"]
}

Here, they're functionally equivalent (but contrived -- roles has no additional-depth elements), but the + takes priority as * loses:

  • If P1 ends in * and P2 doesn't, P1 is lower priority

More concretely, I think a less contrived example would be:

# Policy A -- reader
path "pki/roles/+" {
    capabilities = ["read"]
}
# Policy B -- admin on k8s roles
path "pki/roles/k8s-+" {
    capabilities = ["update", "delete"]
}

Here, the paths would be different, the + appears earlier in Policy A, so Policy A's paths wouldn't apply and we won't get read access to pki/roles/k8s-+ despite being granted it by having the reader policy.

Having no priority on these distinct-path policies would mean you could never get all three capabilities... And changing priority would only decide whether you get 1 or 3 paths.

(But really, this is a design flaw of the k8s admin policy and it should just overwrite and additionally grant the read capability. Perhaps there is a less contrived example in the K/V store, where some users are granted write access to secrets, others are given write access on metadata, and all have read access to both?)

Thoughts @JanMa, @achetronic?


Edit:

I think a less contrived example is the inverse:

# Policy A -- Admin on PKI
path "pki/*" {
    capabilities = ["read", "update", "delete"]
}
# Policy B -- user
path "pki/roles/+" {
    capabilities = ["read"]
}

path "pki/issue/+" {
    capabilities = ["update"]
}

Here, policy B would take precedence as the wildcard occurs later, so admin would get reduced permissions on pki/roles.

@JanMa
Copy link
Member

JanMa commented Sep 24, 2024

Hey @cipherboy,

so far you haven't managed to convince me yet 😄 Let's go through your recent examples:

Functionally equivalent

# Policy A -- reader
path "pki/roles/*" {
    capabilities = ["read"]
}

# Policy B -- modifier
path "pki/roles/+" {
    capabilities = ["update", "delete"]
}

I think this is very contrived and just plain bad policy design. Of course the intended behavior could be achieved with policy unions, but I think it's reasonable to ask someone to use either a wildcard (+) or a glob (*). It would be even better if the modifier role would be self-contained and bring all the required capabilities, including read.

K8s admin roles

# Policy A -- reader
path "pki/roles/+" {
    capabilities = ["read"]
}
# Policy B -- admin on k8s roles
path "pki/roles/k8s-+" {
    capabilities = ["update", "delete"]
}

Even though this is a more realistic example, I think this is also bad policy design. The k8s admin role should include all the necessary capabilities, including read.

Inverse

# Policy A -- Admin on PKI
path "pki/*" {
    capabilities = ["read", "update", "delete"]
}

# Policy B -- user
path "pki/roles/+" {
    capabilities = ["read"]
}

path "pki/issue/+" {
    capabilities = ["update"]
}

This is a good example which would work with both proposed solutions. When using policy priorities you can give the PKI Admin role the highest priority, so it would always take precedence over any other PKI policies.

Using policy unions would work as well because the capabilities are not interfering with each other. But I think what would also cause issues when using policy unions would be the following:

# Policy B -- user
path "pki/roles/+" {
    capabilities = ["read"]
}

path "pki/issue/+" {
    capabilities = ["update"]
}

path "pki/acme/+" {
    capabilities = ["deny"]
}

Assuming a deny will always take precedence in policy unions, then the admin role doesn't have any access to the pki/acme path, even though the role is intended to grant full access. One way to work around this would be to check if the deny comes from a more specific path than the allow, and if that's the case have the allow take precedent. But that's much more difficult to reason about than just assigning priorities to policies IMO.

@cipherboy
Copy link
Member

Hey @cipherboy,

so far you haven't managed to convince me yet 😄 Let's go through your recent examples:

\o No worries, thanks @JanMa for keeping me honest :-D Definitely appreciate the thorough feedback.

I think this is very contrived and just plain bad policy design. Of course the intended behavior could be achieved with policy unions, but I think it's reasonable to ask someone to use either a wildcard (+) or a glob (*). It would be even better if the modifier role would be self-contained and bring all the required capabilities, including read.

I definitely agree here for PKI, but I think there's a place where you could have a write-only policy and a separate set of readers.

Suppose you wanted to use KVv2 and have a set of secret authors (who could just update secrets to new values -- perhaps DevOps personnel) and a separate, usually-distinct set of readers (consumers). Maybe you don't grant writers the ability to write to everything --- maybe you've got different applications and DBAs that do (static 😿) database creds for all of them, &c, so you could maybe end up with this policy ordering and it making sense.

An Outage Investigator (thinking like, break glass devops in case of failures) could be the combination of the two policies (writers, readers), without either role wanting perms of the either. They could read the values (to ensure they're logical looking) or and update them, without both. Maybe you get a JWT with both roles on it, idk.

Perhaps admin in this world is a separate, more globally permissive policy and that's why the outage investigator doesn't get it. Best we not bother the admins, after all ;-)


Using policy unions would work as well because the capabilities are not interfering with each other. But I think what would also cause issues when using policy unions would be the following:

I agree in this example that unions aren't sufficient. Perhaps we'd have a union and a union_sans_deny or something policy to allow users to switch between the two, but, yuck!!


Back to the beginning though, sub-token creation must be explicitly granted by a policy; it is not always available to tokens.... So I'm still feeling like we need some solution here (unless every policy grants sub-token ability, in which case, default should just be modified).

Maybe we need something different, like Cedar (can that even be embedded in our binary?)... Alas, I think that is much bigger effort...

@cipherboy cipherboy added needs rfc Needs an RFC/design document feature New feature or request and removed bug Something isn't working enhancement labels Sep 28, 2024
@achetronic
Copy link
Author

Hi again :) I have been navigating another Hashicorp project and saw this fantastic idea:

hashicorp/boundary#5028

I think this could be awesome as this way everyone can define a model as needed, right? I think the security risks are on the people roof. what are your thoughts on this, guys?

CC: @cipherboy @JanMa

@cipherboy
Copy link
Member

\o hey @achetronic

Both AWS's Cedar and Google's CEL language were also suggested as alternatives. The latter is more flexible and so would be more easy to fit onto the semantics but the former is verifiable which is also rather attractive. OpenBao already has hooks for alternative or additive policy engines, we just only have ACLs tho (as Sentinel/EGP/RGP was Vault Enterprise only).

However, I think I these are both longer term goals. To me, the other ideas are much more achievable in a shorter timeframe.

I think Jan's mostly convinced me that individual path priority is sufficient for most cases... And that the exceptions are maybe not worth dealing with as they might be rare.

@achetronic
Copy link
Author

Well, if the battle was between Cedar and CEL, honestly I think CEL is more friendly and flexible too. BTW, I did not know that OpenBao has already hooks. If I understood correctly, they are available for other engines and not for kv, right? how are they used? Could you drive me to the docs or this is an internal feature?

I think this would be an easy path to follow, as openBao only would have to perform a request to an external service each time the user request some action over some path, right? it's easy to implement as default action is to deny, unless the external service says the opposite

Do you thing the priorities-based mechanism is easier than this? :)

@cipherboy
Copy link
Member

@achetronic It is not engine based but something that needs to be imemented in the core. See https://openbao.org/api-docs/system/policies/ as an overview of the API.

This mostly exists in paths like vault/policy*.go and vault/logical_system.go, this does not impact the engine at all (actually, outside of Auth engines and passing through string names of policies, engines are almost complete unaware of policies).

What do you mean by external system? My understanding of both cel and Cedar are that both could be evaluated in-process without a roundtrip to an external system. OpenBao (like with ACLs today) could be the store of such policy.

My 2c.

@achetronic
Copy link
Author

Oh, when talking about the external system i was not talking about Cedar or CEL, but about an http/grpc server where vault can query about a path with a user, etc to know if the user have permissions on that path or not.

I assumed CEL/Cedar was an in-process approach, just asking about the alternative :)

@cipherboy
Copy link
Member

@achetronic ah, no, the ACL system isn't pluggable in that way, though I suppose it could be... 🤔 @cognifloyd has been looking at what is required to get MFA to be pluggable, so I suppose we could do the same here. It seems a little risky though? Not sure!

We'd have some of the same problems here as with ACL policies though: would they be strictly additive policies (both ACL and an external policy method would both have to allow it) or would they be exclusive (use either ACL or external policy)?

All-in-all, ACL system is here to stay is my gut feel so I'd still like to see some improvements to it in the mean time.

@achetronic
Copy link
Author

achetronic commented Oct 4, 2024

Well, IMO they both ACL and External should be exclusive. I have thought about some parameter defined at engine level to decide whether to use one or another. And in the case of External selected, some params like the URL where to query, etc.

I dunno if this is possible, because as you mentioned, ACL is defined at system level and not per-vault, but if so, i think this would be the better approach to fit everyone's needs, right?

Envoy proxy implements something similar to authenticate and allow a request, and honestly, this was very very handful some weeks ago for our company :)

@cipherboy cipherboy added the roadmap:safer Roadmap item; safer category label Oct 9, 2024
@cipherboy
Copy link
Member

Well, IMO they both ACL and External should be exclusive. I have thought about some parameter defined at engine level to decide whether to use one or another. And in the case of External selected, some params like the URL where to query, etc.

URL to query &c would be implemented by go-plugin or by the backing custom policy plugin. But I think additive has advantages as well, insofar as certain types of rules might be easier to express in one language or another. Sentinel behaves this way, after all.

I dunno if this is possible, because as you mentioned, ACL is defined at system level and not per-vault, but if so, i think this would be the better approach to fit everyone's needs, right?

Once you grow beyond a few K/V mounts, I think you'll frequently see workloads with cross-mount requests. Per-mount policies don't really make sense, then, given the auth mount is still separate and again, not per-secret engine mount.

System wide really is the only design that makes sense, IMHO.


That said, I think improvements to the ACL system on their own make sense to do anyways, regardless of if we do external policy plugins. I'll fork that part of this issue off into a separate feature request.

@cipherboy
Copy link
Member

https://qdl-lang.org/ was also mentioned as an example of another policy language.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core/auth Related to core improvements in authentication feature New feature or request help wanted Extra attention is needed needs rfc Needs an RFC/design document roadmap:safer Roadmap item; safer category
Projects
None yet
Development

No branches or pull requests

3 participants