-
Notifications
You must be signed in to change notification settings - Fork 40.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PersistentVolume dynamic provisioning #6773
Conversation
Thanks for your pull request. It looks like this may be your first contribution to a Google open source project, in which case you'll need to sign a Contributor License Agreement (CLA). 📝 Please visit https://cla.developers.google.com/ to sign. Once you've signed, please reply here (e.g.
|
|
||
The PersistentVolumeSource interface requires ```Create``` and ```Recycle``` functions to support the dynamic creation and reclamation of persistent volumes. Each storage provider implements its own PersistentVolumeSource interface. | ||
|
||
Caveat: Not all persistent volumes will support dynamic provisioning. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume this means things like NFS for which provisioning is not just an API call and is not consistent across installations. But how can you make it possible? Something like a PVSourceOverride that calls a custom script? Maybe that's version 3.0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure if this is what @markturansky intends but I think implementations of PersistentVolumeSource should be independent of the VolumePlugin implementations where it makes sense. For GCE PD for example they are one and the same, but for NFS there can be multiple PersistentVolumeSource implementors which result in NFS exports. There can also be a PersistentVolumeSource which can produce both NFS and iSCSI exports for example.
In the case of just plain NFS with no cloud provider a custom PVSource would make sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So long as a plugin can call exec and the admin creating the PVControllers can specify what to call, I suppose the caveat should be "some assembly required" for some types of volumes.
0d0a3ed
to
af87ff7
Compare
Can one of the admins verify that this patch is reasonable to test? (reply "ok to test", or if you trust the user, reply "add to whitelist") If this message is too spammy, please complain @ixdy. |
ok to test |
As simple as this "PR" is, I don't want to commit it because I simply have not put enough time into thinking about this model yet. |
@thockin This proposal can wait until post 1.0. Just the PRs for recycling are current. We can revisit the rest of this later. |
CLAs look good, thanks! |
GCE e2e build/test failed for commit af87ff7. |
One new API kinds: | ||
|
||
A `PersistentVolumeController` (PVC) is a storage resource provisioned by an administrator. PVCs do not have a namespace. Like a ReplicationController maintains a minimum number of "replicas" of a pod in the system, so a PersistentVolumeController maintains a minimum number of "replicas" of a certain type/size of PersistentVolume available in the system. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@smarterclayton Having to make many ReplicationControllers with Replicas=1 and 1 claim is a bad user experience.
if a user were to lay claim to a PersistentVolumeController
-- which is a creator and manager of its own pool of storage -- could we use this in conjunction with a ReplicationController to scale something like MongoDB? Different dev effort than this PR, but this PR's work could be steps towards solving that scaling issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On Jul 14, 2015, at 2:45 PM, Mark Turansky notifications@github.com wrote:
In docs/design/persistent-volume-provisioning.md
#6773 (comment)
:
@@ -0,0 +1,122 @@
+# Persistent Volume Provisioning
+
+This document proposes a model for dynamically provisioning Persistent Volumes
+
+### tl;dr
+
+One new API kinds:
+
+APersistentVolumeController
(PVC) is a storage resource provisioned by an administrator. PVCs do not have a namespace. Like a ReplicationController maintains a minimum number of "replicas" of a pod in the system, so a PersistentVolumeController maintains a minimum number of "replicas" of a certain type/size of PersistentVolume available in the system.
+
@smarterclayton https://github.com/smarterclayton Having to make many
ReplicationControllers with Replicas=1 and 1 claim is a bad user experience.
if a user were to lay claim to a PersistentVolumeController -- which is a
creator and manager of its own pool of storage -- could we use this in
conjunction with a ReplicationController to scale something like MongoDB?
Different dev effort than this PR, but this PR's work could be steps
towards solving that scaling issue.
Possible. The design of how we assign unique volumes to pods under a
replication controller (and reuse them when pods dies) is really important,
so we should try and get the discussion moving forward on that and reach
closure. Nominal services are tied to this.
—
Reply to this email directly or view it on GitHub
https://github.com/GoogleCloudPlatform/kubernetes/pull/6773/files#r34603920
.
7dfa86e
to
885654c
Compare
I pushed edits to the doc reflecting the new implementation of the Recycler interface. I can start the implementations of Deleter and Creator since they will follow Recycler. No API changes required for those additions. |
GCE e2e build/test failed for commit 885654cdfb8ce7b6831306ca087531cf314a8409. |
GCE e2e build/test failed for commit 7dfa86e9fbe156fb4f80803ba9e02a006c07383d. |
@smarterclayton this is a big focus for us at RH doing storage. |
885654c
to
007e0f3
Compare
### Goals | ||
|
||
* Allow administrators to describe minimum and maximum storage levels comprised of many kinds of PersistentVolumes | ||
* Allow the dynamic creation and reclamation of persistent volumes (to the fullest extent for each type of storage provider) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does the system allow for this? My question about this design proposal is that it seems to imply that the easiest way to provision volume types is to write Go code that calls GCE apis. That works well specifically for GCE/AWS/Cinder (in the normal case) but does not work very well for any organization that has to script their storage creation (by using ansible or some other tool to provision new machines or carve up NFS mounts). It also doesn't work very well for users with custom needs on GCE, or the ability to dynamically react to a claim and create a volume on demand. This seems like a hard problem to generically solve (provisioning storage of many different types) such that I'd ask why it has to be implemented in a formal pattern, vs by an integration that someone writes to watch for new claims or a lack of volumes and go create some more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to be able to run a pod on the server hosting the volumes (assuming it is itself a node in the cluster) in order to create a volume.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does it have to be on the server hosting the volumes?
On Thu, Aug 20, 2015 at 3:30 PM, Paul Morie notifications@github.com
wrote:
In docs/design/persistent-volume-provisioning.md
#6773 (comment):
+
PersistentVolumeControllerManager
is a singleton control loop running in master that manages all PVControllers in the system. The PVCM reconciles the current supply of available PersistentVolumes in the system with the desired levels according to the PVControllers. This process is similar to theReplicationManager
that manages ReplicationControllers.
+
+Three new volume plugin interfaces:
+
+* Recycler -- knows how to scrub a volume clean so it can become available again as a resource
+* Creator -- create new instances of a PV from a template.
+* Deleter -- deletes instances of a PV and allows the plugin to determine how to remove it from the underlying infrastructure
+
+Volume plugins can implement any applicable interfaces. Each plugin will document its own support for dynamic provisioning.
+
+
+### Goals
+
+* Allow administrators to describe minimum and maximum storage levels comprised of many kinds of PersistentVolumes
+* Allow the dynamic creation and reclamation of persistent volumes (to the fullest extent for each type of storage provider)I'd like to be able to run a pod on the server hosting the volumes
(assuming it is itself a node in the cluster) in order to create a volume.—
Reply to this email directly or view it on GitHub
https://github.com/kubernetes/kubernetes/pull/6773/files#r37571168.
Clayton Coleman | Lead Engineer, OpenShift
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@smarterclayton yes, this works much more easily for volumes w/ APIs.
I was thinking about making the strategy for creation be plugin-based. Perhaps the strategies are compiled in the /plugins package and you choose one by name via config (with a sensible default, of course).
If we made creation pluggable, @pmorie can run his pod to create a volume or Go code can call a provider's API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@smarterclayton Are you arguing that this doesn't need API objects, but could just be a super-privileged pod that knows policy and watches across all namespaces for unfulfilled claims and does whatever it needs to do to make new PVs?
This whole design-doc sort of parallels the work @bprashanth is doing on load-balancers, so I'd like to keep them similar in form.
@saad-ali also
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the key distinction is that the in tree controller would only
support a few obviously core types (like a cloud provider). A custom
controller would not be in tree, or compiled with the manager, and we would
disable it.
On Thu, Aug 20, 2015 at 8:46 PM, Mark Turansky notifications@github.com
wrote:
In docs/design/persistent-volume-provisioning.md
#6773 (comment):
+
PersistentVolumeControllerManager
is a singleton control loop running in master that manages all PVControllers in the system. The PVCM reconciles the current supply of available PersistentVolumes in the system with the desired levels according to the PVControllers. This process is similar to theReplicationManager
that manages ReplicationControllers.
+
+Three new volume plugin interfaces:
+
+* Recycler -- knows how to scrub a volume clean so it can become available again as a resource
+* Creator -- create new instances of a PV from a template.
+* Deleter -- deletes instances of a PV and allows the plugin to determine how to remove it from the underlying infrastructure
+
+Volume plugins can implement any applicable interfaces. Each plugin will document its own support for dynamic provisioning.
+
+
+### Goals
+
+* Allow administrators to describe minimum and maximum storage levels comprised of many kinds of PersistentVolumes
+* Allow the dynamic creation and reclamation of persistent volumes (to the fullest extent for each type of storage provider)@smarterclayton https://github.com/smarterclayton yes, this works much
more easily for volumes w/ APIs.I was thinking about making the strategy for creation be plugin-based.
Perhaps the strategies are compiled in the /plugins package and you choose
one by name via config (with a sensible default, of course).If we made creation pluggable, @pmorie https://github.com/pmorie can
run his pod to create a volume or Go code can call a provider's API.—
Reply to this email directly or view it on GitHub
https://github.com/kubernetes/kubernetes/pull/6773/files#r37597422.
Clayton Coleman | Lead Engineer, OpenShift
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@thockin I'm arguing exactly what you said here: https://github.com/kubernetes/kubernetes/pull/6773/files#r37607582
PVs are analogous to Nodes. We don't have NodeSets. We don't have CloudLoadBalancerSets. We shouldn't need PersistentVolumeSets. Create PVs on demand in response to PVCs. One could also potentially horizontally auto-scale to keep a small amount of burst capacity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Devil's advocate - will we ever have some configuration for a k8s cluster that defines the minimum number of nodes, the maximum number of nodes, the distribution of node sizes for auto-creation, the threshold for auto-creation/deletion, and hysteresis? We actually DO have this config but it isn't stored in kubernetes - it's in the cloud auto-scaler.
A PVSet (bad name) as proposed here is the analog to that. We could implement this as a pod with no state in the API, but the state has to go SOMEWHERE. We could argue that this should be a config object, once that exists and in the mean time just expect cmdline flags on a pod (or secrets :)
And then there is network ingress. I think ingress follows the same pattern. @bprashanth
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we do ever add such types, we should involve the auto-scaling team.
@davidopp thanks for the feedback. I addressed your comments in the doc and pushed a new revision. This proposal was pre-1.0 and there was only time for the Recycler, hence some of the confusing references in the doc. I added a little bit more on the Recycler so that this design doc is current and reflects the totality of the feature. The delta left for implementation are Deleters, Creators, and the PVControllerManager loop. I've got a PR with the API/Client for PVControllers. |
007e0f3
to
5eb72f3
Compare
GCE e2e build/test passed for commit 5eb72f3e0013ec0671d3887fb1a0580b4b8696a1. |
|
||
One new API kind: | ||
|
||
A `PersistentVolumeController` (PVCtrl) is a storage resource provisioned by an administrator. PVCtrls do not have a namespace. Just as a `ReplicationController` maintains a number of replicas of a pod, a `PersistentVolumeController` maintains a minimum number of replicas of a `PersistentVolume`. A PVCtrl creates new volumes from a template up to a maximum replica count in increments of the minimum replica count. A well-provisioned cluster will have many PVCs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a desire to move from "active" names to "passive" names in the API. PersistentVolumePolicy? PVPool? PVSet?
I think @bgrant0607 wants to rename replicationController -> replicaSet eventually..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I buy the last sentence.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can change to PersistentVolumeSet
if that's where RC is headed. This thing makes a set of PVs.
I think I meant "PVs" in that last sentence, but that sentence also provides 0 value to the design doc because it may not be always true. I'll remove it.
5eb72f3
to
6d4a90d
Compare
GCE e2e build/test passed for commit 6d4a90d. |
Labelling this PR as size/L |
@kubernetes/rh-storage @smarterclayton @thockin @saad-ali Is there value in PersistentVolumeSets aside from dynamic provisioning? Recycling exists for all storage that has no API. I added support for Delete (#13649) and Create (#13650). Do storage providers with APIs really need pre-provisioned resources or do they benefit more from the on-demand model? I can add Conditions to PVClaims and the existing binder can optionally assign a "Provisionable" condition to unmatched PVCs. This can all be configured via the new VolumeConfig. A new controller watches for provisionable PVCs and uses the Creater interface. That's just two TODOs (Conditions and a new control loop) instead of a new top-level object and all the support it requires. |
On Sep 7, 2015, at 4:32 PM, Mark Turansky notifications@github.com wrote: @kubernetes/rh-storage https://github.com/orgs/kubernetes/teams/rh-storage Recycling exists for all storage that has no API. I added support for Do storage providers with APIs really need pre-provisioned resources or do My opinion is the latter more. I can add Conditions to PVClaims and the existing binder can optionally Why do you need this step? An unmatched pvc is implicitly provisionable A new controller watches for provisionable PVCs and uses the Creater Why wouldn't it just use the existing PV API to create a PV? That's just two TODOs (Conditions and a new control loop) instead of a new — |
I think that we don't want volume code for creating new volumes to be The controller pattern is our plugin pattern for these things. On Sep 7, 2015, at 4:32 PM, Mark Turansky notifications@github.com wrote: @kubernetes/rh-storage https://github.com/orgs/kubernetes/teams/rh-storage Recycling exists for all storage that has no API. I added support for Do storage providers with APIs really need pre-provisioned resources or do I can add Conditions to PVClaims and the existing binder can optionally A new controller watches for provisionable PVCs and uses the Creater That's just two TODOs (Conditions and a new control loop) instead of a new — |
You're right, I don't need a condition. The pvc is "Pending" and unbound. That's all a watcher needs to know to trigger a new resource. I prefer nixing PersistentVolumeSet. We can achieve the same functionality without an additional top level thing. The new volume config stuff allows better configuration of plugins. The only candidates for implementation were AWS/GCE/OpenStack, which are the very ones we would want real dynamic provisioning for. The existing recycler is good enough for statically provisioned pools of stoage. |
Let's talk tomorrow and I'll go over the cli controller work so you can get On Sep 7, 2015, at 7:25 PM, Mark Turansky notifications@github.com wrote: You're right, I don't need a condition. The pvc is "Pending" and unbound. I prefer nixing PersistentVolumeSet. We can achieve the same functionality The only candidates for implementation were AWS/GCE/OpenStack, which are — |
Sounds good. I will follow up in the morning. |
I haven't had time to look at this, sadly, but is there a reason why this should be distinct from nominal services #260? What if a pod has more than one persistent volume? I doubt the user would want arbitrary mix-and-match of volumes every time a pod were replaced. |
I'm a little late here I realize, but I'm trying to wrap my head around what the UX will look like for an admin to create a new dynamically provisioned volume set. The PersistentVolumeTemplateSpec only contains PersistentVolumeSpec, will that be enough information to create a new volume for all volume types? For example how would one define the region for a GCE PD? I imagine this is the type of detail would be handled by the volume plugin, but how would the admin configure the plugin? |
|
||
One new API kind: | ||
|
||
A `PersistentVolumeSet` (PVS) is a storage resource provisioned by an administrator. PVSets do not have a namespace. Just as a `ReplicationController` maintains a number of replicas of a pod, a `PersistentVolumeSet` maintains a minimum number of replicas of a `PersistentVolume`. A PVSet creates new volumes from a template up to a maximum replica count. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Observation, before having read the whole proposal:
PersistentVolumes are like Nodes -- they represent infrastructure, either provisioned by an admin on bare metal or potentially horizontally auto-scaled or provisioned on demand on a public/private cloud. It doesn't seem like there will be a uniform implementation across cloud providers. We don't have NodeSets partly for this reason.
PersistentVolumeClaims are analogous to Pods. We need replication of PVCs, but not independently of the Pods that consume them. That's discussed in #260.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain the objection to PersistentVolumeSet? If they're all provisioned from the same template, it seems like the name correctly parallels ReplicaSet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I didn't understand the comment about PVCs. IIUC they're proposing replication/scaling of PersistentVolumes here, not replication of PVCs.
given how inaccurate this doc is - should we close the PR? |
Yes, we can close this PR. The interfaces described are still relevant, On Thu, Oct 1, 2015 at 4:28 PM, Tim Hockin notifications@github.com wrote:
|
Following PersistentVolumes, this proposal seeks to add the ability for PersistentVolumeControllers to maintain "replicas" of volumes, much like a ReplicationController maintains levels of pods.
PersistentVolumePlugins gain Create and Recycle methods to aid provisioning.