Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PersistentVolume dynamic provisioning #6773

Closed
wants to merge 1 commit into from

Conversation

markturansky
Copy link
Contributor

Following PersistentVolumes, this proposal seeks to add the ability for PersistentVolumeControllers to maintain "replicas" of volumes, much like a ReplicationController maintains levels of pods.

PersistentVolumePlugins gain Create and Recycle methods to aid provisioning.

@googlebot
Copy link

Thanks for your pull request. It looks like this may be your first contribution to a Google open source project, in which case you'll need to sign a Contributor License Agreement (CLA).

📝 Please visit https://cla.developers.google.com/ to sign.

Once you've signed, please reply here (e.g. I signed it!) and we'll verify. Thanks.


  • If you've already signed a CLA, it's possible we don't have your GitHub username or you're using a different email address. Check your existing CLA data and verify that your email is set on your git commits.
  • If you signed the CLA as a corporation, please let us know the company's name.


The PersistentVolumeSource interface requires ```Create``` and ```Recycle``` functions to support the dynamic creation and reclamation of persistent volumes. Each storage provider implements its own PersistentVolumeSource interface.

Caveat: Not all persistent volumes will support dynamic provisioning.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this means things like NFS for which provisioning is not just an API call and is not consistent across installations. But how can you make it possible? Something like a PVSourceOverride that calls a custom script? Maybe that's version 3.0

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if this is what @markturansky intends but I think implementations of PersistentVolumeSource should be independent of the VolumePlugin implementations where it makes sense. For GCE PD for example they are one and the same, but for NFS there can be multiple PersistentVolumeSource implementors which result in NFS exports. There can also be a PersistentVolumeSource which can produce both NFS and iSCSI exports for example.
In the case of just plain NFS with no cloud provider a custom PVSource would make sense.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So long as a plugin can call exec and the admin creating the PVControllers can specify what to call, I suppose the caveat should be "some assembly required" for some types of volumes.

@k8s-bot
Copy link

k8s-bot commented May 22, 2015

Can one of the admins verify that this patch is reasonable to test? (reply "ok to test", or if you trust the user, reply "add to whitelist")

If this message is too spammy, please complain @ixdy.

@erictune
Copy link
Member

erictune commented Jun 1, 2015

ok to test

@thockin
Copy link
Member

thockin commented Jun 2, 2015

As simple as this "PR" is, I don't want to commit it because I simply have not put enough time into thinking about this model yet.

@thockin thockin added this to the v1.0-post milestone Jun 2, 2015
@markturansky
Copy link
Contributor Author

@thockin This proposal can wait until post 1.0. Just the PRs for recycling are current. We can revisit the rest of this later.

@googlebot
Copy link

CLAs look good, thanks!

@googlebot googlebot added cla: yes and removed cla: no labels Jun 2, 2015
@k8s-bot
Copy link

k8s-bot commented Jun 17, 2015

GCE e2e build/test failed for commit af87ff7.

One new API kinds:

A `PersistentVolumeController` (PVC) is a storage resource provisioned by an administrator. PVCs do not have a namespace. Like a ReplicationController maintains a minimum number of "replicas" of a pod in the system, so a PersistentVolumeController maintains a minimum number of "replicas" of a certain type/size of PersistentVolume available in the system.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@smarterclayton Having to make many ReplicationControllers with Replicas=1 and 1 claim is a bad user experience.

if a user were to lay claim to a PersistentVolumeController -- which is a creator and manager of its own pool of storage -- could we use this in conjunction with a ReplicationController to scale something like MongoDB? Different dev effort than this PR, but this PR's work could be steps towards solving that scaling issue.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On Jul 14, 2015, at 2:45 PM, Mark Turansky notifications@github.com wrote:

In docs/design/persistent-volume-provisioning.md
#6773 (comment)
:

@@ -0,0 +1,122 @@
+# Persistent Volume Provisioning
+
+This document proposes a model for dynamically provisioning Persistent Volumes
+
+### tl;dr
+
+One new API kinds:
+
+A PersistentVolumeController (PVC) is a storage resource provisioned by an administrator. PVCs do not have a namespace. Like a ReplicationController maintains a minimum number of "replicas" of a pod in the system, so a PersistentVolumeController maintains a minimum number of "replicas" of a certain type/size of PersistentVolume available in the system.
+

@smarterclayton https://github.com/smarterclayton Having to make many
ReplicationControllers with Replicas=1 and 1 claim is a bad user experience.

if a user were to lay claim to a PersistentVolumeController -- which is a
creator and manager of its own pool of storage -- could we use this in
conjunction with a ReplicationController to scale something like MongoDB?
Different dev effort than this PR, but this PR's work could be steps
towards solving that scaling issue.

Possible. The design of how we assign unique volumes to pods under a
replication controller (and reuse them when pods dies) is really important,
so we should try and get the discussion moving forward on that and reach
closure. Nominal services are tied to this.


Reply to this email directly or view it on GitHub
https://github.com/GoogleCloudPlatform/kubernetes/pull/6773/files#r34603920
.

@markturansky
Copy link
Contributor Author

I pushed edits to the doc reflecting the new implementation of the Recycler interface.

I can start the implementations of Deleter and Creator since they will follow Recycler. No API changes required for those additions.

@k8s-bot
Copy link

k8s-bot commented Jul 14, 2015

GCE e2e build/test failed for commit 885654cdfb8ce7b6831306ca087531cf314a8409.

@k8s-bot
Copy link

k8s-bot commented Jul 14, 2015

GCE e2e build/test failed for commit 7dfa86e9fbe156fb4f80803ba9e02a006c07383d.

@bgrant0607 bgrant0607 removed this from the v1.0-post milestone Jul 24, 2015
@markturansky
Copy link
Contributor Author

@smarterclayton this is a big focus for us at RH doing storage.

### Goals

* Allow administrators to describe minimum and maximum storage levels comprised of many kinds of PersistentVolumes
* Allow the dynamic creation and reclamation of persistent volumes (to the fullest extent for each type of storage provider)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does the system allow for this? My question about this design proposal is that it seems to imply that the easiest way to provision volume types is to write Go code that calls GCE apis. That works well specifically for GCE/AWS/Cinder (in the normal case) but does not work very well for any organization that has to script their storage creation (by using ansible or some other tool to provision new machines or carve up NFS mounts). It also doesn't work very well for users with custom needs on GCE, or the ability to dynamically react to a claim and create a volume on demand. This seems like a hard problem to generically solve (provisioning storage of many different types) such that I'd ask why it has to be implemented in a formal pattern, vs by an integration that someone writes to watch for new claims or a lack of volumes and go create some more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to be able to run a pod on the server hosting the volumes (assuming it is itself a node in the cluster) in order to create a volume.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does it have to be on the server hosting the volumes?

On Thu, Aug 20, 2015 at 3:30 PM, Paul Morie notifications@github.com
wrote:

In docs/design/persistent-volume-provisioning.md
#6773 (comment):

+PersistentVolumeControllerManager is a singleton control loop running in master that manages all PVControllers in the system. The PVCM reconciles the current supply of available PersistentVolumes in the system with the desired levels according to the PVControllers. This process is similar to the ReplicationManager that manages ReplicationControllers.
+
+Three new volume plugin interfaces:
+
+* Recycler -- knows how to scrub a volume clean so it can become available again as a resource
+* Creator -- create new instances of a PV from a template.
+* Deleter -- deletes instances of a PV and allows the plugin to determine how to remove it from the underlying infrastructure
+
+Volume plugins can implement any applicable interfaces. Each plugin will document its own support for dynamic provisioning.
+
+
+### Goals
+
+* Allow administrators to describe minimum and maximum storage levels comprised of many kinds of PersistentVolumes
+* Allow the dynamic creation and reclamation of persistent volumes (to the fullest extent for each type of storage provider)

I'd like to be able to run a pod on the server hosting the volumes
(assuming it is itself a node in the cluster) in order to create a volume.


Reply to this email directly or view it on GitHub
https://github.com/kubernetes/kubernetes/pull/6773/files#r37571168.

Clayton Coleman | Lead Engineer, OpenShift

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@smarterclayton yes, this works much more easily for volumes w/ APIs.

I was thinking about making the strategy for creation be plugin-based. Perhaps the strategies are compiled in the /plugins package and you choose one by name via config (with a sensible default, of course).

If we made creation pluggable, @pmorie can run his pod to create a volume or Go code can call a provider's API.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@smarterclayton Are you arguing that this doesn't need API objects, but could just be a super-privileged pod that knows policy and watches across all namespaces for unfulfilled claims and does whatever it needs to do to make new PVs?

This whole design-doc sort of parallels the work @bprashanth is doing on load-balancers, so I'd like to keep them similar in form.

@saad-ali also

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the key distinction is that the in tree controller would only
support a few obviously core types (like a cloud provider). A custom
controller would not be in tree, or compiled with the manager, and we would
disable it.

On Thu, Aug 20, 2015 at 8:46 PM, Mark Turansky notifications@github.com
wrote:

In docs/design/persistent-volume-provisioning.md
#6773 (comment):

+PersistentVolumeControllerManager is a singleton control loop running in master that manages all PVControllers in the system. The PVCM reconciles the current supply of available PersistentVolumes in the system with the desired levels according to the PVControllers. This process is similar to the ReplicationManager that manages ReplicationControllers.
+
+Three new volume plugin interfaces:
+
+* Recycler -- knows how to scrub a volume clean so it can become available again as a resource
+* Creator -- create new instances of a PV from a template.
+* Deleter -- deletes instances of a PV and allows the plugin to determine how to remove it from the underlying infrastructure
+
+Volume plugins can implement any applicable interfaces. Each plugin will document its own support for dynamic provisioning.
+
+
+### Goals
+
+* Allow administrators to describe minimum and maximum storage levels comprised of many kinds of PersistentVolumes
+* Allow the dynamic creation and reclamation of persistent volumes (to the fullest extent for each type of storage provider)

@smarterclayton https://github.com/smarterclayton yes, this works much
more easily for volumes w/ APIs.

I was thinking about making the strategy for creation be plugin-based.
Perhaps the strategies are compiled in the /plugins package and you choose
one by name via config (with a sensible default, of course).

If we made creation pluggable, @pmorie https://github.com/pmorie can
run his pod to create a volume or Go code can call a provider's API.


Reply to this email directly or view it on GitHub
https://github.com/kubernetes/kubernetes/pull/6773/files#r37597422.

Clayton Coleman | Lead Engineer, OpenShift

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@thockin I'm arguing exactly what you said here: https://github.com/kubernetes/kubernetes/pull/6773/files#r37607582

PVs are analogous to Nodes. We don't have NodeSets. We don't have CloudLoadBalancerSets. We shouldn't need PersistentVolumeSets. Create PVs on demand in response to PVCs. One could also potentially horizontally auto-scale to keep a small amount of burst capacity.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devil's advocate - will we ever have some configuration for a k8s cluster that defines the minimum number of nodes, the maximum number of nodes, the distribution of node sizes for auto-creation, the threshold for auto-creation/deletion, and hysteresis? We actually DO have this config but it isn't stored in kubernetes - it's in the cloud auto-scaler.

A PVSet (bad name) as proposed here is the analog to that. We could implement this as a pod with no state in the API, but the state has to go SOMEWHERE. We could argue that this should be a config object, once that exists and in the mean time just expect cmdline flags on a pod (or secrets :)

And then there is network ingress. I think ingress follows the same pattern. @bprashanth

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we do ever add such types, we should involve the auto-scaling team.

@markturansky
Copy link
Contributor Author

@davidopp thanks for the feedback. I addressed your comments in the doc and pushed a new revision.

This proposal was pre-1.0 and there was only time for the Recycler, hence some of the confusing references in the doc. I added a little bit more on the Recycler so that this design doc is current and reflects the totality of the feature. The delta left for implementation are Deleters, Creators, and the PVControllerManager loop. I've got a PR with the API/Client for PVControllers.

@k8s-bot
Copy link

k8s-bot commented Aug 21, 2015

GCE e2e build/test passed for commit 5eb72f3e0013ec0671d3887fb1a0580b4b8696a1.


One new API kind:

A `PersistentVolumeController` (PVCtrl) is a storage resource provisioned by an administrator. PVCtrls do not have a namespace. Just as a `ReplicationController` maintains a number of replicas of a pod, a `PersistentVolumeController` maintains a minimum number of replicas of a `PersistentVolume`. A PVCtrl creates new volumes from a template up to a maximum replica count in increments of the minimum replica count. A well-provisioned cluster will have many PVCs.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a desire to move from "active" names to "passive" names in the API. PersistentVolumePolicy? PVPool? PVSet?

I think @bgrant0607 wants to rename replicationController -> replicaSet eventually..

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I buy the last sentence.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can change to PersistentVolumeSet if that's where RC is headed. This thing makes a set of PVs.

I think I meant "PVs" in that last sentence, but that sentence also provides 0 value to the design doc because it may not be always true. I'll remove it.

@k8s-bot
Copy link

k8s-bot commented Aug 21, 2015

GCE e2e build/test passed for commit 6d4a90d.

@k8s-github-robot k8s-github-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Aug 27, 2015
@k8s-github-robot
Copy link

Labelling this PR as size/L

@markturansky
Copy link
Contributor Author

@kubernetes/rh-storage @smarterclayton @thockin @saad-ali Is there value in PersistentVolumeSets aside from dynamic provisioning?

Recycling exists for all storage that has no API. I added support for Delete (#13649) and Create (#13650).

Do storage providers with APIs really need pre-provisioned resources or do they benefit more from the on-demand model?

I can add Conditions to PVClaims and the existing binder can optionally assign a "Provisionable" condition to unmatched PVCs. This can all be configured via the new VolumeConfig.

A new controller watches for provisionable PVCs and uses the Creater interface.

That's just two TODOs (Conditions and a new control loop) instead of a new top-level object and all the support it requires.

@smarterclayton
Copy link
Contributor

On Sep 7, 2015, at 4:32 PM, Mark Turansky notifications@github.com wrote:

@kubernetes/rh-storage https://github.com/orgs/kubernetes/teams/rh-storage
@smarterclayton https://github.com/smarterclayton @thockin
https://github.com/thockin @saad-ali https://github.com/saad-ali Is
there value in PersistentVolumeSets aside from dynamic provisioning?

Recycling exists for all storage that has no API. I added support for
Delete (#13649 #13649) and
Create (#13650 #13650).

Do storage providers with APIs really need pre-provisioned resources or do
they benefit more from the on-demand model?

My opinion is the latter more.

I can add Conditions to PVClaims and the existing binder can optionally
assign a "Provisionable" condition to unmatched PVCs. This can all be
configured via the new VolumeConfig.

Why do you need this step? An unmatched pvc is implicitly provisionable

A new controller watches for provisionable PVCs and uses the Creater
interface.

Why wouldn't it just use the existing PV API to create a PV?

That's just two TODOs (Conditions and a new control loop) instead of a new
top-level object and all the support it requires.


Reply to this email directly or view it on GitHub
#6773 (comment).

@smarterclayton
Copy link
Contributor

I think that we don't want volume code for creating new volumes to be
invoked by a single controller. Instead, each "provisioner" is its own
controller that watches pvcs and provisions them. Creating a controller
loop is not supposed to be hard, and this is the classic "I don't like your
provision code, I'm going to write my own" case for an admin.

The controller pattern is our plugin pattern for these things.

On Sep 7, 2015, at 4:32 PM, Mark Turansky notifications@github.com wrote:

@kubernetes/rh-storage https://github.com/orgs/kubernetes/teams/rh-storage
@smarterclayton https://github.com/smarterclayton @thockin
https://github.com/thockin @saad-ali https://github.com/saad-ali Is
there value in PersistentVolumeSets aside from dynamic provisioning?

Recycling exists for all storage that has no API. I added support for
Delete (#13649 #13649) and
Create (#13650 #13650).

Do storage providers with APIs really need pre-provisioned resources or do
they benefit more from the on-demand model?

I can add Conditions to PVClaims and the existing binder can optionally
assign a "Provisionable" condition to unmatched PVCs. This can all be
configured via the new VolumeConfig.

A new controller watches for provisionable PVCs and uses the Creater
interface.

That's just two TODOs (Conditions and a new control loop) instead of a new
top-level object and all the support it requires.


Reply to this email directly or view it on GitHub
#6773 (comment).

@markturansky
Copy link
Contributor Author

You're right, I don't need a condition. The pvc is "Pending" and unbound. That's all a watcher needs to know to trigger a new resource.

I prefer nixing PersistentVolumeSet. We can achieve the same functionality without an additional top level thing. The new volume config stuff allows better configuration of plugins.

The only candidates for implementation were AWS/GCE/OpenStack, which are the very ones we would want real dynamic provisioning for. The existing recycler is good enough for statically provisioned pools of stoage.

@smarterclayton
Copy link
Contributor

Let's talk tomorrow and I'll go over the cli controller work so you can get
an idea of what we could offer if someone wants to write their own dynamic
provisioner or recycler.

On Sep 7, 2015, at 7:25 PM, Mark Turansky notifications@github.com wrote:

You're right, I don't need a condition. The pvc is "Pending" and unbound.
That's all a watcher needs to know to trigger a new resource.

I prefer nixing PersistentVolumeSet. We can achieve the same functionality
without an additional top level thing. The new volume config stuff allows
better configuration of plugins.

The only candidates for implementation were AWS/GCE/OpenStack, which are
the very ones we would want real dynamic provisioning for. The existing
recycler is good enough for statically provisioned pools of stoage.


Reply to this email directly or view it on GitHub
#6773 (comment).

@markturansky
Copy link
Contributor Author

Sounds good. I will follow up in the morning.

@saad-ali saad-ali added this to the v1.1-candidate milestone Sep 9, 2015
@bgrant0607
Copy link
Member

I haven't had time to look at this, sadly, but is there a reason why this should be distinct from nominal services #260? What if a pod has more than one persistent volume? I doubt the user would want arbitrary mix-and-match of volumes every time a pod were replaced.

@saad-ali
Copy link
Member

I'm a little late here I realize, but I'm trying to wrap my head around what the UX will look like for an admin to create a new dynamically provisioned volume set.

The PersistentVolumeTemplateSpec only contains PersistentVolumeSpec, will that be enough information to create a new volume for all volume types? For example how would one define the region for a GCE PD?

I imagine this is the type of detail would be handled by the volume plugin, but how would the admin configure the plugin?


One new API kind:

A `PersistentVolumeSet` (PVS) is a storage resource provisioned by an administrator. PVSets do not have a namespace. Just as a `ReplicationController` maintains a number of replicas of a pod, a `PersistentVolumeSet` maintains a minimum number of replicas of a `PersistentVolume`. A PVSet creates new volumes from a template up to a maximum replica count.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Observation, before having read the whole proposal:

PersistentVolumes are like Nodes -- they represent infrastructure, either provisioned by an admin on bare metal or potentially horizontally auto-scaled or provisioned on demand on a public/private cloud. It doesn't seem like there will be a uniform implementation across cloud providers. We don't have NodeSets partly for this reason.

PersistentVolumeClaims are analogous to Pods. We need replication of PVCs, but not independently of the Pods that consume them. That's discussed in #260.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain the objection to PersistentVolumeSet? If they're all provisioned from the same template, it seems like the name correctly parallels ReplicaSet.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I didn't understand the comment about PVCs. IIUC they're proposing replication/scaling of PersistentVolumes here, not replication of PVCs.

@thockin
Copy link
Member

thockin commented Oct 1, 2015

given how inaccurate this doc is - should we close the PR?

@markturansky
Copy link
Contributor Author

Yes, we can close this PR. The interfaces described are still relevant,
but can be described in the existing PV docs.

On Thu, Oct 1, 2015 at 4:28 PM, Tim Hockin notifications@github.com wrote:

given how inaccurate this doc is - should we close the PR?


Reply to this email directly or view it on GitHub
#6773 (comment)
.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.