distributed resizing #142

pohly · 2021-03-22T15:36:05Z

For local storage (e.g. LVM disks) it makes sense to support a deployment model where the driver only runs locally on nodes together with sidecars. This is already support for provisioning (kubernetes-csi/external-provisioner#524, https://github.com/kubernetes-csi/external-provisioner#deployment-on-each-node).

pohly · 2021-03-22T15:36:29Z

In the resizer, the sidecar probably needs to check whether the volume is local before invoking the CSI driver. This can be achieved via the existing topology field. kubernetes-csi/external-provisioner#592 might help to reduce traffic by configuring the informer so that it filters by label and thus only receives PVs that are of interest. See kubernetes-csi/external-provisioner#583 for code which does such server-side filtering for CSIStorageCapacity objects.

fejta-bot · 2021-06-20T16:59:18Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale

fejta-bot · 2021-07-20T17:34:44Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten

pohly · 2021-08-02T08:15:34Z

/remove-lifecycle rotten
/lifecycle frozen

travisghansen · 2022-02-22T02:35:03Z

How much effort is required for this?

pohly · 2022-02-22T08:59:38Z

I suspect it'll be easier than distributed provisioning because the sidecar can already determine from the PV whether it is responsible for the operation. See kubernetes-csi/external-snapshotter#585 for the corresponding feature in the snapshotter.

gnufied · 2022-03-01T16:54:46Z

I am not convinced that we need distributed resizing in external-resizer. I see a clear use case of why distributed provisioning is needed for local volumes but I don't see a similar use case for resizing. Compared to provisioning - resizing is a two step process and NodeExpandVolume always runs on the node (called by kubelet), so if overall capacity supports volume expansion then expansion can happen on the node without supporting distributed resizing.

Is there a use case, where both ControllerExpandVolume and NodeExpandVolume must happen on the node where volume is available - because there is no visibility of volume capacity at cluster level? If so - then does distributed resizing solves it? It seems like - if I expand a local volume to 100TB and capacity is not available on the node, I will be stuck anyways distributed resizing or not.

travisghansen · 2022-03-01T17:35:37Z

A fair point.

My sense is that controllers generally have no knowledge or understanding of how the device is being used and is just expanding raw capacity while node operations (when necessary) are meant to deal with the specifics of ensuring the volume (generally a block device if node resizing is involved) capacity is readily accessible to the workload (generally resizing the fs as necessary). I prefer to keep these logical separations.

With that in mind, I have several common code paths that are used by both distributed and centralized deployments (my project currently has 17 distinct 'drivers') so I'd love to be able to keep those logically in the same buckets.

From a broader (than k8s) perspective, I think the 'deploy the controller alongside the node' workflow is clearly a goal of csi so, regardless of any perceived value/arguments/points I make above, making assumptions about what makes sense or not probably doesn't serve the interests of the spec in the long-term. The spec is interested in facilitating node+controller deployments jointly and part of controller deployments is resizing...by extension csi compliant COs should be capable of invoking resize operations on controllers deployed in such a fashion.

In short, IMO regardless of specific merits for my use-case, it seems a supported csi deployment style that should 'just work' with k8s and/or any other COs.

gnufied · 2022-03-01T18:17:48Z

The spec is interested in facilitating node+controller deployments jointly and part of controller deployments is resizing...by extension csi compliant COs should be capable of invoking resize operations on controllers deployed in such a fashion.

The spec does not say anything about node+controller deployment on same nodes. I am still unsure, what problems distributed resizing will solve that is not already solved by calling NodeExpandVolume on the node.

travisghansen · 2022-03-01T19:14:31Z

The spec does not say anything about node+controller deployment on same nodes. I am still unsure, what problems distributed resizing will solve that is not already solved by calling NodeExpandVolume on the node.

Isn't that figure 2 and 3 here?: https://github.com/container-storage-interface/spec/blob/master/spec.md

Again, regardless of that, IMO the provisioner and snapshotter both support node+controller deployments. The fact that k8s has various clients/code-bases to invoke the grpc methods is an implementation details. If k8s supports node+controller it should support it fully and not selectively decide which grpc methods it will or will not invoke based on supposition about the merits/requirements of the feature by csi driver implementations from specific SPs.

pohly · 2022-03-02T08:52:30Z

@gnufied: if the CSI driver only runs on the nodes, where would you run external-resizer?

If you only run one instance of it per cluster, then it has no CSI driver instance that it can call. Or is the expectation that a CSI driver then has an implementation that runs on the node and another implementation that runs alongside the external-resizer, with the second implementation not doing any actual work?

gnufied · 2022-03-02T14:55:47Z

Currently - if a CSI driver supports node-only volume expansion, the external-resizer does provide a no-op trivial resizer which is used for updating PV (kubelet can't update PVs) and rest of the volume expansion happens on kubelet via NodeExpandVolume.

pohly · 2022-03-03T10:58:45Z

I agree that this seems to be sufficient for node-local storage. Is this documented sufficiently so that CSI driver developers can find out about it?

travisghansen · 2022-03-03T14:50:26Z

While I agree it can be done (at least for my use-cases), I disagree with what I consider divergence from the spec. I don’t think it’s appropriate to selectively decide which controller methods will/will not be invoked in node+controller setups based on supposition.

Moving on, I’m happy to test the deployment scenario but I’m unsure what it would look like. Is this available with current releases or is some change still required to support the method? Do I deploy resizer with nodes or just 1 instance? Which args should I be using? etc

gnufied · 2022-03-03T16:30:14Z

I get the point about deploying controller side of the plugin on the node and not having any explicit control-plane associated with CSI driver. I need to think some more about it. I have scheduled this item for discussion on Monday's CSI implementation meeting. If you can manage, please join so as we can talk about it.

travisghansen · 2022-03-03T16:39:17Z

Of course! Send over meeting details here or on slack.

pohly · 2022-03-03T19:32:04Z

I don’t think it’s appropriate to selectively decide which controller methods will/will not be invoked in node+controller setups based on supposition.

My understanding is that the controller side will only be skipped if a CSI driver developer explicitly asks for it by deploying the CSI driver in a suitable way. That seems okay to me. How that works specifically I don't know (hence my question about documentation).

gnufied · 2022-03-03T20:01:07Z

https://kubernetes-csi.github.io/docs/volume-expansion.html documents the deployment workflow.

Esp. following bits:

The Kubernetes CSI development team maintains external-resizer Kubernetes CSI Sidecar Containers. This sidecar container implements the logic for watching the Kubernetes API for Persistent Volume claim edits and issuing ControllerExpandVolume RPC call against a CSI endpoint and updating PersistentVolume object to reflect new size.

This sidecar is needed even if CSI driver does not have EXPAND_VOLUME controller capability, in this case it performs a NO-OP expansion and updates PersistentVolume object. NodeExpandVolume is always called by Kubelet on the node.

travisghansen · 2022-03-03T21:11:52Z

OK thanks for pointing out the doc!

IMO this furthers my points about the spec. Stating the perhaps obvious, the sidecar (to my knowledge) must be deployed as..a sidecar. It cannot be deployed standalone without the/a controller deployed along with it. That deployment story does not match any of the architectures from the spec. It's a sort of mashing together of 1+2 and/or 1+3 where 'node' hosts are running controller+node plugins and then a 'master' host is running a controller plugin. Or said differently, 'controller' plugin is running both on 'nodes' and 'master' hosts.

zimnx · 2023-06-01T20:26:03Z

This sidecar is needed even if CSI driver does not have EXPAND_VOLUME controller capability, in this case it performs a NO-OP expansion and updates PersistentVolume object.

In case when there's an error during expanding, shouldn't this cause update to fail?
Controller deployed as deployment, can't check whether there's enough space to expand on different node.
Leaving PV in expanded state even though expansion didn't happen is wrong in my view.

CSI Spec also requires specific errors to be returned when Controller isn't able to expand the volume, so no-op expansion is not in sync with the spec.

…ERS-and-OWNERS_ALIASES OCPBUGS-14812: Chore: Update OWNERS and OWNERS_ALIASES

pohly mentioned this issue Apr 6, 2021

CSI and Local Volume kubernetes-csi/external-provisioner#525

Closed

pohly mentioned this issue Apr 30, 2021

Use hostpath.csi.k8s.io across multiple node pools kubernetes-csi/csi-driver-host-path#211

Open

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 20, 2021

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jul 20, 2021

k8s-ci-robot added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. labels Aug 2, 2021

travisghansen mentioned this issue Feb 22, 2022

zfs-local (non ephemereal) a possibility (possibly a feature request) democratic-csi/democratic-csi#148

Open

travisghansen linked a pull request Feb 26, 2022 that will close this issue

implement distributed resizing #195

Open

dobsonj pushed a commit to dobsonj/external-resizer that referenced this issue Oct 16, 2023

Merge pull request kubernetes-csi#142 from mpatlasov/Chore-Update-OWN…

d76deca

…ERS-and-OWNERS_ALIASES OCPBUGS-14812: Chore: Update OWNERS and OWNERS_ALIASES

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

distributed resizing #142

distributed resizing #142

pohly commented Mar 22, 2021

pohly commented Mar 22, 2021

fejta-bot commented Jun 20, 2021

fejta-bot commented Jul 20, 2021

pohly commented Aug 2, 2021

travisghansen commented Feb 22, 2022

pohly commented Feb 22, 2022

gnufied commented Mar 1, 2022 •

edited

Loading

travisghansen commented Mar 1, 2022

gnufied commented Mar 1, 2022

travisghansen commented Mar 1, 2022

pohly commented Mar 2, 2022

gnufied commented Mar 2, 2022

pohly commented Mar 3, 2022

travisghansen commented Mar 3, 2022

gnufied commented Mar 3, 2022

travisghansen commented Mar 3, 2022

pohly commented Mar 3, 2022

gnufied commented Mar 3, 2022

travisghansen commented Mar 3, 2022

zimnx commented Jun 1, 2023

distributed resizing #142

distributed resizing #142

Comments

pohly commented Mar 22, 2021

pohly commented Mar 22, 2021

fejta-bot commented Jun 20, 2021

fejta-bot commented Jul 20, 2021

pohly commented Aug 2, 2021

travisghansen commented Feb 22, 2022

pohly commented Feb 22, 2022

gnufied commented Mar 1, 2022 • edited Loading

travisghansen commented Mar 1, 2022

gnufied commented Mar 1, 2022

travisghansen commented Mar 1, 2022

pohly commented Mar 2, 2022

gnufied commented Mar 2, 2022

pohly commented Mar 3, 2022

travisghansen commented Mar 3, 2022

gnufied commented Mar 3, 2022

travisghansen commented Mar 3, 2022

pohly commented Mar 3, 2022

gnufied commented Mar 3, 2022

travisghansen commented Mar 3, 2022

zimnx commented Jun 1, 2023

gnufied commented Mar 1, 2022 •

edited

Loading