Skip to content

Commit

Permalink
Proposal for external dynamic provisioners
Browse files Browse the repository at this point in the history
  • Loading branch information
jsafrane committed Oct 19, 2016
1 parent a47ac1f commit 9519753
Showing 1 changed file with 301 additions and 31 deletions.
332 changes: 301 additions & 31 deletions docs/proposals/volume-provisioning.md
Original file line number Diff line number Diff line change
Expand Up @@ -64,48 +64,68 @@ types of volumes within a single cloud.

One of our goals is to enable administrators to create out-of-tree
provisioners, that is, provisioners whose code does not live in the Kubernetes
project. Our experience since the 1.2 release with dynamic provisioning has
shown that it is impossible to anticipate every aspect and manner of
provisioning that administrators will want to perform. The proposed design
should not prevent future work to allow out-of-tree provisioners.
project.

## Design

This design represents the minimally viable changes required to provision based on storage class configuration. Additional incremental features may be added as a separate effort.

We propose that:

1. For the base impelementation storage class and volume selectors are mutually exclusive.

2. An api object will be incubated in storage.k8s.io/v1beta1 to hold the a `StorageClass`
1. Both for in-tree and out-of-tree storage provisioners, the PV created by the
provisioners must match the PVC that led to its creations. If a provisioner
is unable to provision such a matching PV, it reports an error to the
user.

2. The above point applies also to PVC label selector. If user submits a PVC
with a label selector, the provisioner must provision a PV with matching
labels. This directly implies that the provisioner understands meaning
behind these labels - if user submits a claim with selector that wants
a PV with label "region" not in "[east,west]", the provisioner must
understand what label "region" means, what available regions are there and
choose e.g. "north".

In other words, provisioners should either refuse to provision a volume for
a PVC that has a selector, or select few labels that are allowed in
selectors (such as the "region" example above), implement necessary logic
for their parsing, document them and refuse any selector that references
unknown labels.

3. An api object will be incubated in storage.k8s.io/v1beta1 to hold the a `StorageClass`
API resource. Each StorageClass object contains parameters required by the provisioner to provision volumes of that class. These parameters are opaque to the user.

3. `PersistentVolume.Spec.Class` attribute is added to volumes. This attribute
4. `PersistentVolume.Spec.Class` attribute is added to volumes. This attribute
is optional and specifies which `StorageClass` instance represents
storage characteristics of a particular PV.

During incubation, `Class` is an annotation and not
actual attribute.

4. `PersistentVolume` instances do not require labels by the provisioner.
5. `PersistentVolume` instances do not require labels by the provisioner.

5. `PersistentVolumeClaim.Spec.Class` attribute is added to claims. This
6. `PersistentVolumeClaim.Spec.Class` attribute is added to claims. This
attribute specifies that only a volume with equal
`PersistentVolume.Spec.Class` value can satisfy a claim.

During incubation, `Class` is just an annotation and not
actual attribute.

6. The existing provisioner plugin implementations be modified to accept
7. The existing provisioner plugin implementations be modified to accept
parameters as specified via `StorageClass`.

7. The persistent volume controller modified to invoke provisioners using `StorageClass` configuration and bind claims with `PersistentVolumeClaim.Spec.Class` to volumes with equivalent `PersistentVolume.Spec.Class`
8. The persistent volume controller modified to invoke provisioners using `StorageClass` configuration and bind claims with `PersistentVolumeClaim.Spec.Class` to volumes with equivalent `PersistentVolume.Spec.Class`

8. The existing alpha dynamic provisioning feature be phased out in the
9. The existing alpha dynamic provisioning feature be phased out in the
next release.

### Controller workflow for provisioning volumes

0. Kubernetes administator can configure name of a default StorageClass. This
StorageClass instance is then used when user requests a dynamically
provisioned volume, but does not specify a StorageClass. In other words,
`claim.Spec.Class == ""`
(or annotation `volume.beta.kubernetes.io/storage-class == ""`).

1. When a new claim is submitted, the controller attempts to find an existing
volume that will fulfill the claim.

Expand All @@ -125,30 +145,280 @@ We propose that:
periodically retries finding a matching volume or storage class again until
a match is found. The claim is `Pending` during this period.

4. With StorageClass instance, the controller finds volume plugin specified by
StorageClass.Provisioner.
4. With StorageClass instance, the controller updates the claim:
* `claim.Annotations["volume.beta.kubernetes.io/storage-provisioner"] = storageClass.Provisioner`

* **In-tree provisioning**

The controller tries to find an internal volume plugin referenced by
`storageClass.Provisioner`. If it is found:

5. The internal provisioner implements interface`ProvisionableVolumePlugin`,
which has a method called `NewProvisioner` that returns a new provisioner.

6. The controller calls volume plugin `Provision` with Parameters
from the `StorageClass` configuration object.

7. If `Provision` returns an error, the controller generates an event on the
claim and goes back to step 1., i.e. it will retry provisioning
periodically.

8. If `Provision` returns no error, the controller creates the returned
`api.PersistentVolume`, fills its `Class` attribute with `claim.Spec.Class`
and makes it already bound to the claim

1. If the create operation for the `api.PersistentVolume` fails, it is
retried

2. If the create operation does not succeed in reasonable time, the
controller attempts to delete the provisioned volume and creates an event
on the claim

Existing behavior is un-changed for claims that do not specify
`claim.Spec.Class`.

* **Out of tree provisioning**

Following step 4. above, the controller tries to find internal plugin for the
`StorageClass`. If it is not found, it does not do anything, it just
periodically goes to step 1., i.e. tries to find available matching PV.

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD",
"SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be
interpreted as described in RFC 2119.

External provisioner must have these features:

* It MUST have a distinct name, following Kubernetenes plugin naming scheme
`<vendor name>/<provisioner name>`, e.g. `gluster.org/gluster-volume`.

* The provisioner SHOULD send events on a claim to report any errors
related to provisioning a volume for the claim. This way, users get the same
experience as with internal provisioners.

5. All provisioners are in-tree; they implement an interface called
`ProvisionableVolumePlugin`, which has a method called `NewProvisioner`
that returns a new provisioner.
* The provisioner MUST implement also a deleter. It must be able to delete
storage assets it created. It MUST NOT assume that any other internal or
external plugin is present.

6. The controller calls volume plugin `Provision` with Parameters from the `StorageClass` configuration object.
The external provisioner runs in a separate process which watches claims, be
it an external storage appliance, a daemon or a Kubernetes pod. For every
claim creation or update, it implements these steps:

7. If `Provision` returns an error, the controller generates an event on the
claim and goes back to step 1., i.e. it will retry provisioning periodically
1. The provisioner inspects if
`claim.Annotations["volume.beta.kubernetes.io/storage-provisioner"] == <provisioner name>`.
All other claims MUST be ignored.

8. If `Provision` returns no error, the controller creates the returned
`api.PersistentVolume`, fills its `Class` attribute with `claim.Spec.Class`
and makes it already bound to the claim
2. The provisioner MUST check that the claim is unbound, i.e. its
`claim.Spec.VolumeName` is empty. Bound volumes MUST be ignored.

1. If the create operation for the `api.PersistentVolume` fails, it is
retried
*Race condition when the provisioner provisions a new PV for a claim and
at the same time Kubernetes binds the same claim to another PV that was
just created by admin is discussed below.*

2. If the create operation does not succeed in reasonable time, the
controller attempts to delete the provisioned volume and creates an event
on the claim
3. It tries to find a StorageClass instance referenced by annotation
`claim.Annotations["volume.beta.kubernetes.io/storage-class"]`. If not
found, it SHOULD report an error (by sending an event to the claim) and it
SHOULD retry periodically with step i.

Existing behavior is un-changed for claims that do not specify `claim.Spec.Class`.
4. The provisioner MUST parse arguments in the `StorageClass` and
`claim.Spec.Selector` and provisions appropriate storage asset that matches
both the parameters and the selector.
When it encounters unknown parameters in `storageClass.Parameters` or
`claim.Spec.Selector` or the combination of these parameters is impossible
to achieve, it SHOULD report an error and it MUST NOT provision a volume.
All errors found during parsing or provisioning SHOULD be send as events
on the claim and the provisioner SHOULD retry periodically with step i.

As parsing (and understanding) claim selectors is hard, the sentence
"MUST parse ... `claim.Spec.Selector`" will in typical case lead to simple
refusal of claims that have any selector:

```go
if pvc.Spec.Selector != nil {
return Error("can't parse PVC selector!")
}
```

5. When the volume is provisioned, the provisioner MUST create a new PV
representing the storage asset and save it in Kubernetes. When this fails,
it SHOULD retry creating the PV again few times. If all attempts fail, it
MUST delete the storage asset. All errors SHOULD be sent as events to the
claim.

The created PV MUST have these properties:

* `pv.Spec.ClaimRef` MUST point to the claim that led to its creation
(including the claim UID).

*This way, the PV will be bound to the claim.*

* `pv.Annotations["pv.kubernetes.io/provisioned-by"]` MUST be set to name
of the external provisioner. This provisioner will be used to delete the
volume.

*The provisioner/delete should not assume there is any other
provisioner/deleter available that would delete the volume.*

* `pv.Annotations["volume.beta.kubernetes.io/storage-class"]` MUST be set
to name of the storage class requested by the claim.

*So the created PV matches the claim.*

* The provisioner MAY store any other information to the created PV as
annotations. It SHOULD save any information that is needed to delete the
storage asset there, as appropriate StorageClass instance may not exist
when the volume will be deleted. However, references to Secret instance
or direct username/password to a remote storage appliance MUST NOT be
stored there, see issue #34822.

* `pv.Labels` MUST be set to match `claim.spec.selector`. The provisioner
MAY add additional labels.

*So the created PV matches the claim.*

* `pv.Spec` MUST be set to match requirements in `claim.Spec`, especially
access mode and PV size. The provisioned volume size MUST NOT be smaller
than size requested in the claim, however it MAY be larger.

*So the created PV matches the claim.*

* `pv.Spec.PersistentVolumeSource` MUST be set to point to the created
storage asset.

* `pv.Spec.PersistentVolumeReclaimPolicy` SHOULD be set to `Delete` unless
user manually configures other reclaim policy.

* `pv.Name` MUST be unique. Internal provisioners use name based on
`claim.UID` to produce conflicts when two provisioners accidentally
provision a PV for the same claim, however external provisioners can use
any mechanism to generate an unique PV name.

Example of a claim that is to be provisioned by an external provisioner for
`foo.org/foo-volume`:

```yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
annotations:
volume.beta.kubernetes.io/storage-class: myClass
volume.beta.kubernetes.io/storage-provisioner: foo.org/foo-volume
name: fooclaim
namespace: default
resourceVersion: "53"
uid: 5a294561-7e5b-11e6-a20e-0eb6048532a3
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 4Gi
# volumeName: must be empty!
```

Example of the created PV:

```yaml
apiVersion: v1
kind: PersistentVolume
metadata:
annotations:
pv.kubernetes.io/provisioned-by: foo.org/foo-volume
volume.beta.kubernetes.io/storage-class: myClass
foo.org/provisioner: "any other annotations as needed"
labels:
foo.org/my-label: "any labels as needed"
generateName: "foo-volume-"
spec:
accessModes:
- ReadWriteOnce
awsElasticBlockStore:
fsType: ext4
volumeID: aws://us-east-1d/vol-de401a79
capacity:
storage: 4Gi
claimRef:
apiVersion: v1
kind: PersistentVolumeClaim
name: fooclaim
namespace: default
resourceVersion: "53"
uid: 5a294561-7e5b-11e6-a20e-0eb6048532a3
persistentVolumeReclaimPolicy: Delete
```

As result, Kubernetes has a PV that represents the storage asset and is bound
to the claim. When everything went well, Kubernetes completed binding of the
claim to the PV.

Kubernetes was not blocked in any way during the provisioning and could
either bound the claim to another PV that was created by user or even the
claim may have been deleted by the user. In both cases, Kubernetes will mark
the PV to be delete using the protocol below.

The external provisioner MAY save any annotations to the claim that is
provisioned, however the claim may be modified or even deleted by the user at
any time.


### Controller workflow for deleting volumes

When the controller decides that a volume should be deleted it performs these
steps:

1. The controller changes `pv.Status.Phase` to `Released`.

2. The controller looks for `pv.Annotations["pv.kubernetes.io/provisioned-by"]`.
If found, it uses this provisioner/deleter to delete the volume.

3. If the volume is not annotated by `pv.kubernetes.io/provisioned-by`, the
controller inspects `pv.Spec` and finds in-tree deleter for the volume.

4. If the deleter found by steps 2. or 3. is internal, it calls it and deletes
the storage asset together with the PV that represents it.

5. If the deleter is not known to Kubernetes, it does not do anything.

6. External deleters MUST watch for PV changes. When
`pv.Status.Phase == Released && pv.Annotations['pv.kubernetes.io/provisioned-by'] == <deleter name>`,
the deleter:

* It MUST check reclaim policy of the PV and ignore all PVs whose
`Spec.PersistentVolumeReclaimPolicy` is not `Delete`.

* It MUST delete the storage asset.

* Only after the storage asset was successfully deleted, it MUST delete the
PV object in Kubernetes.

* Any error SHOULD be sent as an event on the PV being deleted and the
deleter SHOULD retry to delete the volume periodically.

* The deleter SHOULD NOT use any information from StorageClass instance
referenced by the PV. This is different to internal deleters, which
need to be StorageClass instance present at the time of deletion to read
Secret instances (see Gluster provisioner for example), however we would
like to phase out this behavior.

Note that watching `pv.Status` has been frowned upon in the past, however in
this particular case we could use it quite reliably to trigger deletion.
It's not trivial to find out if a PV is not needed and should be deleted.
*Alternatively, an annotation could be used.*
### Security considerations
Both internal and external provisioners and deleters may need access to
credentials (e.g. username+password) of an external storage appliance to
provision and delete volumes.
* For internal provisioners, a Secret instance in a well secured namespace
should be used. Pointer to the Secret instance shall be parameter of the
StorageClass and it MUST NOT be copied around the system e.g. in annotations
of PVs. See issue #34822.
* External provisioners running in pod should have appropriate credentials
mouted as Secret inside pods that run the provisioner. Namespace with the pods
and Secret instance should be well secured.
### `StorageClass` API
Expand Down Expand Up @@ -253,7 +523,7 @@ parameters:
0. Annotation `volume.alpha.kubernetes.io/storage-class` is used instead of `claim.Spec.Class` and `volume.Spec.Class` during incubation.
1. `claim.Spec.Selector` and `claim.Spec.Class` are mutually exclusive. User can either match existing volumes with `Selector` XOR match existing volumes with `Class` and get dynamic provisioning by using `Class`. This simplifies initial PR and also provisioners.
1. `claim.Spec.Selector` and `claim.Spec.Class` are mutually exclusive for now (1.4). User can either match existing volumes with `Selector` XOR match existing volumes with `Class` and get dynamic provisioning by using `Class`. This simplifies initial PR and also provisioners. This limitation may be lifted in future releases.
# Cloud Providers
Expand Down

0 comments on commit 9519753

Please sign in to comment.