Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add CSI External Components ClusterRole to bootstrapped roles #61866

Merged
merged 1 commit into from
Apr 11, 2018

Conversation

davidz627
Copy link
Contributor

Added CSI External Components ClusterRole to bootstrapped roles and removed creation from failing e2e test

Fixes: #61781

/sig storage
/kind bug
/assign @liggitt @saad-ali

NONE

@k8s-ci-robot k8s-ci-robot added sig/storage Categorizes an issue or PR as relevant to SIG Storage. kind/bug Categorizes issue or PR as related to a bug. release-note-none Denotes a PR that doesn't merit a release note. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Mar 29, 2018
@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Mar 29, 2018
@davidz627
Copy link
Contributor Author

davidz627 commented Mar 29, 2018

E2E test that was failing is tested and passing on GKE with this change

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Mar 29, 2018
rbac.NewRule("get", "list", "watch", "update").Groups(legacyGroup).Resources("persistentvolumeclaims").RuleOrDie(),
rbac.NewRule("get", "list", "watch", "create", "update", "patch").Groups(legacyGroup).Resources("events").RuleOrDie(),
rbac.NewRule("get", "list").Groups(legacyGroup).Resources("secrets").RuleOrDie(),
rbac.NewRule("get", "list", "watch", "update").Groups(legacyGroup).Resources("nodes").RuleOrDie(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is a volume provisioner updating Node objects?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is not, I am splitting up roles now :)

rbac.NewRule("create", "delete", "get", "list", "watch", "update").Groups(legacyGroup).Resources("persistentvolumes").RuleOrDie(),
rbac.NewRule("get", "list", "watch", "update").Groups(legacyGroup).Resources("persistentvolumeclaims").RuleOrDie(),
rbac.NewRule("get", "list", "watch", "create", "update", "patch").Groups(legacyGroup).Resources("events").RuleOrDie(),
rbac.NewRule("get", "list").Groups(legacyGroup).Resources("secrets").RuleOrDie(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't expect listing of secrets, and I'd expect secret permissions to be granted specific to an installation in combination with the storage class setup, not globally to all secrets across all namespaces

rbac.NewRule("get", "list", "watch", "update").Groups(storageGroup).Resources("volumeattachments").RuleOrDie(),
rbac.NewRule("get", "list", "watch").Groups(storageGroup).Resources("storageclasses").RuleOrDie(),
},
},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should grant patch in addition to update and write external components to use patch to minimize chances of dropping fields on version skewed clusters

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now none of the components use "Patch". It is an open issue and we are aware of it. Should we give "Patch" API access now, or wait until we update the components to support "Patch"ing

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

go ahead and add patch permission, since we know that is the desired operation long-term

@@ -458,6 +458,19 @@ func ClusterRoles() []rbac.ClusterRole {
eventsRule(),
},
},
{
// a role for csi external components {csi-provisioner,csi-attacher,driver-registrar}
ObjectMeta: metav1.ObjectMeta{Name: "system:csi-external-components"},
Copy link
Member

@liggitt liggitt Mar 29, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Role names are typically singular

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where does the driver-registrar run?

If the provisioner, attacher, and driver are run in separate locations with different levels of access to the API, three roles might make more sense. I wouldn't want every kubelet to have the ability to delete any PV object

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the test we are attempting to fix they are all run in the same location, however we wish to leave the flexibility of CSI Driver implementers to split them up if they wish. I will create 3 roles with minimal permissions required for each component

@davidz627
Copy link
Contributor Author

@liggitt looking into splitting into 3 separate roles for the 3 external components.

Could you elaborate on "I'd expect secret permissions to be granted specific to an installation in combination with the storage class setup, not globally to all secrets across all namespaces" I'm not exactly sure what that means. You mean a role for secrets permissions would be created on installation of a specific CSI driver that is limited to the namespace it's sitting in?

@saad-ali
Copy link
Member

I would suggest three different roles: csi-provisioner, csi-attacher and csi-mounter each with the minimal permissions that are absolutely required for the csi provisioner, the csi attacher, and the csi mounter.

/assign @msau42

@davidz627
Copy link
Contributor Author

davidz627 commented Mar 30, 2018

Split into 3 roles. Minimal required permissions given for each role. @liggitt @saad-ali PTAL

Rules: []rbac.PolicyRule{
rbac.NewRule("create", "delete", "list", "watch").Groups(legacyGroup).Resources("persistentvolumes").RuleOrDie(),
rbac.NewRule("get", "list", "watch", "update").Groups(legacyGroup).Resources("persistentvolumeclaims").RuleOrDie(),
rbac.NewRule("get").Groups(legacyGroup).Resources("secrets").RuleOrDie(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

provisioners don't have a general need to get every secret in the cluster. a provisioner might:

  • need no secret-related permissions
  • need permissions to view a particular secret for provisioning
  • need permissions to view secrets in a particular namespace for provisioning
  • need permissions to create attach/mount-related secrets in a particular namespace

rather than grant broad permissions here on the off chance the provisioner needs them, I'd expect specific permissions to be granted as a particular provisioner or storage class requires.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I will remove the secrets for now. We will have to implement some sort of workaround to allow for creating Roles in the test framework for GKE once we start adding tests for external CSI components that require secrets.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

once we start adding tests for external CSI components that require secrets.

I'd actually expect those to come with recommended setups that put them in a namespace, set up a storage class for them, and granted them appropriate secret permissions within that namespace. Basically, the standard role (which you are defining here) takes care of the "every provisioner will have to do X" things, and they get to grant any additional permissions they want in a manifest specific to their component.

Copy link
Contributor Author

@davidz627 davidz627 Mar 30, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, when I remove the secrets it the roles I'm adding here will be consistent with "every provisioner will have to do X". The recommended setups given to users of CSI will have the additional roles to grant whatever additional permissions they need. The only issue is that the test framework on GKE does not allow creation of roles by default so there would just need to be a workaround in the test or a larger framework/GKE change would need to be done.

Live deployment of these plugins was never really the issue. The change was spurred mainly by test failure on GKE.

// a role for csi external components {csi-provisioner,csi-attacher,driver-registrar}
ObjectMeta: metav1.ObjectMeta{Name: "system:csi-driver-registrar"},
Rules: []rbac.PolicyRule{
rbac.NewRule("get", "update").Groups(legacyGroup).Resources("nodes").RuleOrDie(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not familiar with the registrar component... where does this process run, and what on the Node object is it updating?
/cc @saad-ali

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The driver registrar adds an annotation on the node that contains a JSON map of driver names to node names. These node names could be different from what Kubernetes views as the node name. The process will generally be deployed alongside the CSI Plugin in a Pod.

Copy link
Member

@liggitt liggitt Mar 30, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so this is giving every kubelet running the CSI plugin pod access to a credential (the service account credential for the CSI pod) granted a role that lets it modify any node in the cluster?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe so, @saad-ali to confirm

@k8s-ci-robot k8s-ci-robot requested a review from saad-ali March 30, 2018 00:55
clusterRole = csiClusterRole(cs, config, false)
serviceAccount = csiServiceAccount(cs, config, false)
csiClusterRoleBinding(cs, config, false, serviceAccount, clusterRole)
serviceAccount = csiCombinedServiceAccount(cs, config, false)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is up to @kubernetes/sig-storage-pr-reviews, but I'd recommend exercising each pair of phases (provision/delete, attach/detach, mount/unmount) using a separate service account with just the specified role. That way, e2e tests make it really clear when a new operation is performed in a phase that isn't covered by the relevant role.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that would be a good test to have, however this test is for testing the HostPath CSI Plugin specifically, not the CSI external components. We should definitely add in tests for the external components soon @saad-ali.

@liggitt let me know if I misunderstood your comment

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this test is for testing the HostPath CSI Plugin specifically, not the CSI external components

so does that mean it doesn't need the provisioner and attacher roles?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The csi-provisioner, driver-registrar, and csi-attacher are side-car containers necessary for the HostPath CSI Plugin container. The pod containing all 4 containers will need provisioner, attacher, and driver-registrar roles.

@liggitt
Copy link
Member

liggitt commented Mar 30, 2018

thanks, looks a lot better. just a few comments

@davidz627
Copy link
Contributor Author

For future reference: for tests for plugins that require secrets, the test may have to implement a workaround described here: #43409 (comment)

Or we should get it so that the test framework in GKE is allowed to create ClusterRoles.

@davidz627
Copy link
Contributor Author

Removed "get" secrets from system:csi-external-provisioner cluster role.

ObjectMeta: metav1.ObjectMeta{Name: "system:csi-external-provisioner"},
Rules: []rbac.PolicyRule{
rbac.NewRule("create", "delete", "list", "watch").Groups(legacyGroup).Resources("persistentvolumes").RuleOrDie(),
rbac.NewRule("get", "list", "watch", "update").Groups(legacyGroup).Resources("persistentvolumeclaims").RuleOrDie(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does the external provisioner need to update PVCs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's required for leader election in order "to acquire lease to provision for pvc"

// a role for the csi external attacher
ObjectMeta: metav1.ObjectMeta{Name: "system:csi-external-attacher"},
Rules: []rbac.PolicyRule{
rbac.NewRule("get", "list", "watch", "update").Groups(legacyGroup).Resources("persistentvolumes").RuleOrDie(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does the attacher update in PVs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It adds a finalizer to the PV

@davidz627
Copy link
Contributor Author

/retest

@saad-ali
Copy link
Member

saad-ali commented Apr 2, 2018

Is self-reporting of CSI driver<->node name mappings expected in real deployments?

CSI allows for the node naming for storage system to be independent of node naming of the Kubernetes cluster. That said, I'm not aware of any cluster where this is actually the case.

That said, moving forward there are plans to extend this functionality to allow a storage plugin to report the "topology labels" that apply to a specific node (e.g. zone, rack, etc.), in addition to the node name. These labels would be applied to the node as node labels. So driver-registrar will require the ability to both add annotations and labels in the future.

@liggitt
Copy link
Member

liggitt commented Apr 2, 2018

CSI allows for the node naming for storage system to be independent of node naming of the Kubernetes cluster.

That's fine, but it seems like it should be driven by a source of truth, not self-reporting.

moving forward there are plans to extend this functionality to allow a storage plugin to report the "topology labels" that apply to a specific node (e.g. zone, rack, etc.), in addition to the node name. These labels would be applied to the node as node labels.

letting nodes arbitrarily self-label themselves is problematic (see https://github.com/kubernetes/community/pull/911/files for multiple examples why) and work is in progress to reduce that in the future.

@saad-ali
Copy link
Member

saad-ali commented Apr 2, 2018

The csi-provisioner, driver-registrar, and csi-attacher are side-car containers necessary for the HostPath CSI Plugin container. The pod containing all 4 containers will need provisioner, attacher, and driver-registrar roles.

And a copy of this pod, with credentials granted all three of these roles, is anticipated to be running on every node?

This is not correct. Technically only 1 instance of provisioner and attacher need to running per cluster. So a CSI deployment would consist of two deployments:

  1. StatefulSet or ReplicaSet with size=1
    • 3 containers: provisioner, attacher, and plugin
  2. DeamonSet
    • 2 containers: driver-registrar and plugin.

The pod for 1, only needs provisioner and attacher permissions (not driver-registrar) and runs on a single node somewhere on the cluster.
The pods for 2, only need permission for driver-registrar (not for provisioner or attacher) and run on every single node.

That's fine, but it seems like it should be driven by a source of truth, not self-reporting.

Source of truth in this case is the storage system, and k8s interacts with the storage system via CSI. It queries the storage system and to discover the name of a node according to the storage system (via CSI), and it needs to store the mapping between the storage node name and k8s node name somewhere (the k8s node object was the most obvious place).

letting nodes arbitrarily self-label themselves is problematic (see https://github.com/kubernetes/community/pull/911/files for multiple examples why) and work is in progress to reduce that in the future.

Good to know. I'm not sure it can be avoided, but I'll set up some time to sync up on this and see if we can come up with a better solution.

@verult
Copy link
Contributor

verult commented Apr 7, 2018

Instead of whitelisting specific label names or prefixes in NodeRestriction, what if we whitelist regexes?

Not sure exactly what the CSI topology labels would look like yet, but just wanted to see what are the options.

@davidz627
Copy link
Contributor Author

@saad-ali @msau42 @liggitt I have updated the test with the workaround that will get this test running on GKE, as well as linked a TODO to an issue to remove the workaround when the work surrounding removing the node update requirements from the CSI external components is resolved.

PTAL

clusterRoleBindingClient := client.RbacV1().ClusterRoleBindings()
stdout, stderr, err := framework.RunCmd("gcloud", "config", "get-value", "core/account")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rather than writing a GCE-specific command, you could do this to get a clientset you could use to define roles (impersonation also used by https://github.com/kubernetes/kubernetes/blob/master/test/e2e/auth/node_authz.go#L65, and the e2e user is presumed to be sufficiently authorized to be allowed to impersonate):

By("Creating an impersonating superuser kubernetes clientset to define RBAC roles")
config, err := framework.LoadConfig()
Expect(err).NotTo(HaveOccurred())
config.Impersonate = restclient.ImpersonationConfig{
	UserName: "superuser",
	Groups:   []string{"system:masters"},
}
superuserClientset, err := clientset.NewForConfig(config)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thats perfect, incorporating now.

@saad-ali
Copy link
Member

saad-ali commented Apr 9, 2018

@liggitt @davidz627 and I met and discussed the issues with the CSI driver-registrar permissions. The driver-registrar required permission to the Kubernetes node API object in order to persist the mapping of CSI storage node name and, in the future, to allow CSI volume plugins to specify node topology labels.

However this would require giving every instance of the driver-registrar on every node permission to edit all Kubernetes node API objects, meaning if one node was compromised, it could modify any other node object.

We decided that a better approach would be to remove the logic for applying node labels/annotation from the driver-registrar sidecar container, and move it to the kubelet. This will take advantage of the fact that kubelet's node access is already limited (via node authorizer) to its own node object.

The Kubernetes CSI team will track this as an item for the coming quarter. In the meantime, in order to fix the test on GKE, dyzz@ will give the E2E user admin privileges to allow it to create RBAC rules.

clusterRole = csiClusterRole(cs, config, false)
serviceAccount = csiServiceAccount(cs, config, false)
csiClusterRoleBinding(cs, config, false, serviceAccount, clusterRole)
csiDriverRegistrarClusterRole(cs, config, false)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably a non-issue for this PR because there's a single test in this section, but if there were multiple tests, teardown of this shared global role at the beginning of every test would be disruptive... as a follow-up, might want to change this to just create the clusterrole if needed and not worry about tearing it down afterward

similar comments about the service account binding... you'd probably want to make it unique per test (include namespace in the clusterrolebinding name)

@liggitt
Copy link
Member

liggitt commented Apr 9, 2018

rbac changes lgtm
/approve

will let @kubernetes/sig-storage-pr-reviews LGTM the test

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 9, 2018
@davidz627
Copy link
Contributor Author

/retest

@saad-ali
Copy link
Member

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 10, 2018
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: davidz627, liggitt, saad-ali

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-github-robot
Copy link

Automatic merge from submit-queue (batch tested with PRs 62192, 61866, 62206, 62360). If you want to cherry-pick this change to another branch, please follow the instructions here.

@k8s-github-robot k8s-github-robot merged commit 56d6f05 into kubernetes:master Apr 11, 2018
@davidz627 davidz627 deleted the fix/CSIe2e branch April 11, 2018 16:22
k8s-github-robot pushed a commit that referenced this pull request Jun 15, 2018
Automatic merge from submit-queue.

Add RBAC policy rules for csi-external-provisioner and csi-external-attacher

Adds RBAC Policy rules for `csi-external-provisioner` and `csi-external-attacher` so that CSI drivers can bind to these cluster roles on every version of k8s where CSI is Beta or above.

These roles were added in 1.11 but never cherrypicked back to 1.10. The roles originally added as a part of a larger change here: #61866

 I could not do a direct cherry-pick because some of the RBAC primitives changed and there was also a fix applied on top with this PR:
#65070
The fix has been included in this commit. 

/kind enhancement
/sig storage
/cc @msau42 
/assign @liggitt @MaciekPytel

```release-note
NONE
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note-none Denotes a PR that doesn't merit a release note. sig/storage Categorizes an issue or PR as relevant to SIG Storage. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants