-
Notifications
You must be signed in to change notification settings - Fork 40k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add RBAC authorization API group and authorizer #25634
Conversation
failure seems legit
|
9b16640
to
be32353
Compare
@batikanu thanks! Updated. The tests should (hopefully) be happier now. |
d77df34
to
3fdcb6c
Compare
3fdcb6c
to
8385482
Compare
Working on the integration test failure but otherwise this is ready to review. |
So, can you or @liggitt say more about how ConfirmNoEscalation looks at multiple objects but doesn't need to multi-object transaction? |
GetRoleReferenceRules(ctx api.Context, roleRef api.ObjectReference, namespace string) ([]rbac.PolicyRule, error) | ||
|
||
// GetEffectivePolicyRules returns the list of rules that apply to a given user in a given namespace and error. If an error is returned, the slice of | ||
// PolicyRules may not be complete, but it contains all retrievable rules. This is done because policy rules are purely additive and policy determinations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@liggitt may not be complete because of eventual consistency in the storage system? Or something else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because of a dead/missing/unobserved roleref
in openshift, we have an authorizer cache that watches clusterroles/roles and bindings. Even authorization decisions can lag. The no escalation check just ensures the user had equivalent powers within the watch lag. If a user had a role removed and instantly attempted to grant a role with those removed permissions, it could succeed if the authz cache hadn't seen the role removal yet (just like any other API call trying to use those removed permissions could prior to the cache observing the update) |
@@ -0,0 +1,59 @@ | |||
/* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To do complex validation for job, I put that in the Strategy's Validate function.
Why is this implemented as a Storage rather than a Strategy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see any other types that have their own storage.go
, but I see many that define a custom Strategy. I think you should use that approach if possible. See pkg/registry/secret/strategy.go
for example. Unless there is some reason it needs to be at the Storage layer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The AuthorizationRuleResolver needs a storage, so it's hard to initialize a storage without and existing one. That's the main logic for having this logic separated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is also what openshift does (not to say that that makes it appropriate for kubernetes) https://github.com/openshift/origin/blob/388478c40e751c4295dcb9a44dd69e5ac65d0e3b/pkg/authorization/registry/role/policybased/virtual_storage.go
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is also what openshift does (not to say that that makes it appropriate for kubernetes) https://github.com/openshift/origin/blob/388478c40e751c4295dcb9a44dd69e5ac65d0e3b/pkg/authorization/registry/role/policybased/virtual_storage.go
We created our storage long before strategies existed. Without virtual resources, I don't see a strong case for keeping them. I think that Validate
and ValidateUpdate
provide sufficient power to express the ConfirmNoEscalation
case. The cyclical dependency problem due to a lack of separation between a read-only storage and a read-write storage is problematic though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, my mistake, s/validation/storage/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't been following this PR terribly closely, but I think one
significant disadvantage is that you can't be certain a role change takes
effect across the entire cluster easily (can't manage centrally). That's
certainly worrying for an administrator.
Multilevel security policy is not uncommon - I would hesitate to say that a
tested multilevel system is no more at risk than a tested flattened system.
Are there other advantages to propagation that inclines you to prefer it?
If we wanted to make it so that NamespacedRoleBindings could not reference a ClusterRole, then we could have a controller that watches for new namespace (like token controller does) and then automatically creates Roles in each new namespace when the namespace. Those roles would explicitly include the namespace that they are restricted to, rather than it being implied, which might somewhat reduce the chance of bugs. This would have about the same level of convenience. It would also have the property that the per-namespace default policy would not automatically update when the cluster is upgraded, which could be good, but could also be bad if it means that you have to update each namespace's policy when a new Kind is added to the system that you want to add for certain roles.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#25787 seems to have broken the ability to make the type assertions these checks rely on[0] since the runtime.Object
is now a rest.UpdatedObjectInfo
which hides the underlying object being updated to.
Would it be reasonable to pull the privilege escalation checks from this PR and revisit a more robust multilevel security system for beta? Based on the conversation here (and the above issue) it seems like there might be a better home for these checks anyway.
cc @erictune
[0] https://github.com/kubernetes/kubernetes/pull/25634/files#diff-0ddbb384845118d5b907e3498b76455cR66
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The object is still available, it just has to be obtained by passing in the existing object (which should be done inside the innermost storage during the GuaranteedUpdate call).
Wanting to add checks/transformations/behavior on the new object seems like a common enough pattern that I opened #26240 to add a helper for it. If you want to use that, you could do something like this, which just moves your checks into a function that gets chained onto the end of the call that retrieves the new object:
func (s *Storage) Update(ctx api.Context, name string, objInfo rest.UpdatedObjectInfo) (runtime.Object, bool, error) {
if user, ok := api.UserFrom(ctx); ok {
if s.superUser != "" && user.GetName() == s.superUser {
return s.StandardStorage.Update(ctx, name, objInfo)
}
}
nonescalatingInfo := rest.WrapUpdatedObjectInfo(objInfo, func(ctx api.Context, obj, oldObj runtime.Object) (runtime.Object, error) {
clusterRoleBinding, ok := obj.(*rbac.ClusterRoleBinding)
if !ok {
... return internal cast error ...
}
rules, err := s.ruleResolver.GetRoleReferenceRules(ctx, clusterRoleBinding.RoleRef, clusterRoleBinding.Namespace)
if err != nil {
return nil, err
}
if err := validation.ConfirmNoEscalation(ctx, s.ruleResolver, rules); err != nil {
return nil, err
}
return obj, nil
})
return s.StandardStorage.Update(ctx, name, nonescalatingInfo)
}
That also has the benefit of giving you the old object, so in the future, escalation checks could take into account whether you were actually adding any bindings, or just removing/annotating, etc.
Would it be reasonable to pull the privilege escalation checks from this PR and revisit a more robust multilevel security system for beta? Based on the conversation here (and the above issue) it seems like there might be a better home for these checks anyway.
I still think storage (or something connected to it, like the strategy) is the best place for it. I'd prefer we start with it here or in the strategy, not punted to admission or omitted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah that would work. Thanks, I'll update this in the morning.
This looks good so far. Things that are missing and should go in this PR or the next are:
I would like @kubernetes/sig-api-machinery to have a day to look at it, if they want to. I have reviewed everything to the limits of my knowledge of API machinery. |
seeing a test failure:
|
yeah, updated those and am running the verification suite locally |
Am going to do a squash while I'm at it. |
4124441
to
36ecec5
Compare
Because submit queue is blocked, I'm going to kick off re-tests of the top 5 PRs in the queue, then merge them if they pass. |
GCE e2e build/test passed for commit 36ecec5. |
Is there any documentation to go along with this? I'm very interested in figuring out how this works, but there is a lot of code to try and read through. |
@kfox1111 not yet. I'll plan to following up in the next few days with lots. |
@kfox1111 There has been similar documentation from openshift? https://docs.openshift.com/enterprise/3.0/architecture/additional_concepts/authorization.html |
How to run in RBAC mode?
|
RBAC does not use a policy file, it uses objects stored in the API to define clusterroles and roles (defining permissions) and clusterrolebindings and rolebindings granting those roles within namespaces or cluster wide. |
Is there an issue I can subscribe to for follow up design on RBAC bootstrapping? The |
@harryge00
works for me. You cannot use the "resource groups". You need to list all resources. We didn't upstream the resource groups support from OpenShift. @ericchiang can you document how to create basic viewer and writer roles for 1.3 somewhere? |
@jimmycuadra how will you authenticate users? Can you use a credential for the bootstrap account that expires and is not renewed? |
We authenticate using client certificates. We have a custom agent that runs on each server that populates and refreshes the latest CA/certs/keys from a central source of truth (etcd, we're on CoreOS). The systemd unit that starts the apiserver is baked into the master servers' cloud-config data, however, and is never changed without cycling the whole server. It wouldn't necessarily help to have a super user account that expires, because there'd be nothing to prevent a new cert/key pair for the same username from being generated using our CA. Perhaps another way bootstrapping could work is by the super user making an API call to set a flag (persisted in etcd) that disables their own access, and apiserver would ignore the I don't want to steer this discussion off course since this PR has already been merged, so I was really just asking if there was an existing issue to continue talking about it. It'd been mentioned previously in the thread that |
Another question: I bind a user with a clusterrole, but when accessing the api with this user, it says:
http://stackoverflow.com/questions/38299105/how-to-use-rbac-in-kubernetes. I think the user is the same as in the token file And maybe in the later version, we can dynamically add users through api instead of writing new users to the static token file? |
@harryge00 merged PRs are not a good place for general questions or debugging. Please read the new authorization documentation[0], open a new issue if you feel you have a bug, or use the #kuberenetes-users or #sig-auth channels on the kubernetes slack[1], we're generally very responsive there. [0] http://kubernetes.io/docs/admin/authorization/ |
This PR updates #23396 by adding the necessary registry and client code for storage of the openshift authorization types added in #24900.
Still need to add restrictions to role bindings to prevent privilege escalation. See discussion [0] and openshift implementation[1]EDIT: Added an implementation of rule escalation prevention similar to [2]
Adding an API group is extremely poorly documented, so there's a good chance I've missed something. For now I've been going off discussion in #25562, but please let me know if there's anything wrong.
TODO: lost of testing
cc @erictune @sym3tri @philips @bobbyrullo @liggitt @deads2k
[0] #24900 (comment)
[1] https://github.com/openshift/origin/blob/388478c40e751c4295dcb9a44dd69e5ac65d0e3b/pkg/authorization/registry/rolebinding/registry.go#L31-L34
[2] https://github.com/openshift/origin/blob/master/pkg/authorization/rulevalidation/user_covers.go