Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Local storage plugin #44897

Merged
merged 7 commits into from
May 31, 2017
Merged

Conversation

msau42
Copy link
Member

@msau42 msau42 commented Apr 25, 2017

What this PR does / why we need it:
Volume plugin implementation for local persistent volumes. Scheduler predicate will direct already-bound PVCs to the node that the local PV is at. PVC binding still happens independently.

Which issue this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close that issue when PR gets merged):
Part of #43640

Release note:

Alpha feature: Local volume plugin allows local directories to be created and consumed as a Persistent Volume.  These volumes have node affinity and pods will only be scheduled to the node that the volume is at.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Apr 25, 2017
@k8s-reviewable
Copy link

This change is Reviewable

@k8s-github-robot k8s-github-robot added kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. release-note-label-needed labels Apr 25, 2017
@msau42
Copy link
Member Author

msau42 commented Apr 25, 2017

/assign @saad-ali @jsafrane

cc @kubernetes/sig-storage-pr-reviews

@msau42 msau42 force-pushed the local-storage-plugin branch from 144afc3 to 2bf82d0 Compare April 25, 2017 04:06
@k8s-github-robot k8s-github-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 26, 2017
return spec.PersistentVolume.Spec.LocalVolume, spec.ReadOnly, nil
}

return nil, false, fmt.Errorf("Spec does not reference an LocalVolume volume type")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

an -> a

}

// TODO: how to prevent multiple mounts with conflicting fsGroup?
return volume.SetVolumeOwnership(m, fsGroup)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if ! m.readOnly {
  return volume.SetVolumeOwnership(m, fsGroup)
}

func (m *localVolumeMounter) GetAttributes() volume.Attributes {
return volume.Attributes{
ReadOnly: m.readOnly,
Managed: false,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Managed:   !m.readOnly

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does Managed mean? It's not documented...

return volume.Attributes{
ReadOnly: m.readOnly,
Managed: false,
SupportsSELinux: false,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why false?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have cycles to test and verify this. Anyone else is welcome to take this up :-)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In SELinux enabled cluster, we should have this set to true in order to share local volumes even for containers in a pod. This is also required to share volume among different pods.

We already have user manage and provide SELinux labels right now, so setting this to true should suffice.

also commented here #45054

}
}

// Checks prior to mount operations to verify that the required components (binaries, etc.)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CanMount checks ...

localVolume: &localVolume{
podUID: pod.UID,
volName: spec.Name(),
mounter: plugin.host.GetMounter(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use SafeFormatAndMount to format the disk if not formatted?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now, the volume specified has to already have a filesystem. We will add block support in the future.

}

func (plugin *localVolumePlugin) GetAccessModes() []v1.PersistentVolumeAccessMode {
// The current meaning of AccessMode is how many nodes can attach to it, not how many pods can mount it
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does kubernetes supports this semantic? I thought ReadWriteOnce is only used to tell how many pods can mount it.

The current meaning of AccessMode is how many nodes can attach to it

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the way that access modes work in the code right now, contrary to what most people assume. It's an issue with all volumes, not just local volumes. But perhaps local volumes will reveal this discrepancy more than other types.

I think there are two approaches to fix this: 1) Change the semantic of current AccessMode, and hopefully nobody was relying on the old behavior. 2) Add a new kind of AccessMode to enforce at the pod level.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't tell which approach is better at this moment. But from a user's perspective, this is definitely a problem if there is such a big misunderstanding. I'll see if I can help when I get there.

}, nil
}

func (plugin *localVolumePlugin) ConstructVolumeSpec(volumeName, mountPath string) (*volume.Spec, error) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chakri-nelluri Can you check if this is correct? This volume can only be specified as a PV, so I am using the PV name as the volume's name.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, it looks good to me. Since PV Name is unique, it can be used.

@msau42 msau42 force-pushed the local-storage-plugin branch from 2bf82d0 to e8bec8c Compare May 16, 2017 20:22
@k8s-github-robot k8s-github-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 16, 2017
@msau42 msau42 force-pushed the local-storage-plugin branch 3 times, most recently from 38ce1de to 510dc85 Compare May 16, 2017 22:26
@k8s-github-robot k8s-github-robot removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API labels May 22, 2017
@msau42
Copy link
Member Author

msau42 commented May 23, 2017

API is in, so I've rebased this PR to pick up those changes.

Copy link

@mikegrass mikegrass left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nits.

}
}
os.Remove(dir)
glog.Errorf("Mount of volume %s failed: %v", dir, err)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to move this up to line 219, so that the error from m.mounter.Mount is logged for all paths?

glog.Errorf("Failed to unmount: %v", mntErr)
return err
}
notMnt, mntErr := m.mounter.IsLikelyNotMountPoint(dir)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: could use assignment (=) instead of short variable declaration (:=) here for notMnt, mntErr

@@ -76,11 +76,18 @@ type VolumeTestConfig struct {
ServerImage string
// Ports to export from the server pod. TCP only.
ServerPorts []int
// Commands to run in the comtainer image.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: s/comtainer/container/

}

By("Removing the test directory")
writeCmd := fmt.Sprintf("rm -r %s", volume.containerDir)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: maybe rename to removeCmd to match the intent?

@msau42
Copy link
Member Author

msau42 commented May 26, 2017

@k8s-bot pull-kubernetes-unit test this

@msau42
Copy link
Member Author

msau42 commented May 26, 2017

@k8s-bot pull-kubernetes-node-e2e test this

@saad-ali
Copy link
Member

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 30, 2017
@saad-ali saad-ali added this to the v1.7 milestone May 30, 2017
@k8s-github-robot
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: msau42, saad-ali
We suggest the following additional approver: dchen1107

Assign the PR to them by writing /assign @dchen1107 in a comment when ready.

Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@saad-ali saad-ali added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 30, 2017
@msau42
Copy link
Member Author

msau42 commented May 30, 2017

@k8s-bot pull-kubernetes-unit test this

@msau42
Copy link
Member Author

msau42 commented May 30, 2017

@k8s-bot pull-kubernetes-e2e-gce-etcd3 test this

1 similar comment
@msau42
Copy link
Member Author

msau42 commented May 30, 2017

@k8s-bot pull-kubernetes-e2e-gce-etcd3 test this

@k8s-github-robot
Copy link

Automatic merge from submit-queue (batch tested with PRs 46076, 43879, 44897, 46556, 46654)

k8s-github-robot pushed a commit that referenced this pull request May 31, 2017
Automatic merge from submit-queue (batch tested with PRs 46076, 43879, 44897, 46556, 46654)

Local storage plugin

**What this PR does / why we need it**:
Volume plugin implementation for local persistent volumes.  Scheduler predicate will direct already-bound PVCs to the node that the local PV is at.  PVC binding still happens independently.

**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: 
Part of #43640

**Release note**:

```
Alpha feature: Local volume plugin allows local directories to be created and consumed as a Persistent Volume.  These volumes have node affinity and pods will only be scheduled to the node that the volume is at.
```
@k8s-github-robot k8s-github-robot merged commit 0aad9d3 into kubernetes:master May 31, 2017
@sky-big
Copy link

sky-big commented Jun 24, 2017

Hi, When I use the local presistent plugin, have one question, this is my yaml configuration:

apiVersion: v1
kind: PersistentVolume
metadata:
annotations:
volume.alpha.kubernetes.io/node-affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- 192.168.152.134
name: my-pv
spec:
capacity:
storage: 5Gi
accessModes:
- ReadWriteOnce
local:
path: /tmp/k8s-storage

This configuration can't run success, the error main reason is nodeAffinity, when I delete nodeAffinity and delete validation.go file about nodeAffinity check code and kube_features.go PersistentLocalVolumes false to true , then the configuration is :

apiVersion: v1
kind: PersistentVolume
metadata:
name: my-pv
spec:
capacity:
storage: 5Gi
accessModes:
- ReadWriteOnce
local:
path: /tmp/k8s-storage

then success. So I want to ask how to configure the local volume plugin nodeAffinity ?
@msau42

@msau42
Copy link
Member Author

msau42 commented Jun 24, 2017

@sky-big Try the example here

@sky-big
Copy link

sky-big commented Jun 25, 2017

thanks @msau42

@groundnuty
Copy link

groundnuty commented Jul 13, 2017

@msau42 I am having a success with the example you linked above and I've integrated it with a simple nginx log persistence:

kind: Pod
apiVersion: v1
metadata:
  name: task-pv-pod
spec:
  volumes:
    - name: task-pv-storage
      persistentVolumeClaim:
       claimName: example-local-claim
  containers:
    - name: task-pv-container
      image: nginx
      ports:
        - containerPort: 80
          name: "http-server"
      volumeMounts:
      - mountPath: "/var/log/nginx/"
        name: task-pv-storage
---
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: example-local-claim
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 10Gi
  storageClassName: local-storage
---
apiVersion: v1
kind: PersistentVolume
metadata:
  name: task-pv-volume
  annotations:
    "volume.alpha.kubernetes.io/node-affinity": '{
        "requiredDuringSchedulingIgnoredDuringExecution": {
            "nodeSelectorTerms": [
                { "matchExpressions": [
                    { "key": "kubernetes.io/hostname",
                      "operator": "In",
                      "values": ["k8s-cyfronet-1-worker10"]
                    }
                ]}
             ]}
          }'
spec:
  capacity:
    storage: 10Gi
  accessModes:
    - ReadWriteOnce
  persistentVolumeReclaimPolicy: Delete
  storageClassName: local-storage
  local:
    path: /mnt/storage/k8s_local_volumes/

The small issue is persistentVolumeReclaimPolicy: when set to Delete and deleting the pv/task-pvc-volume

The PV enters the Failed state with a message:

Error getting deleter volume plugin for volume "task-pv-volume": no deletable volume plugin matched

The desired behavior would be rm -rf /mnt/storage/k8s_local_volumes/*.

Am I doing something wrong, or is normal in alpha?

@msau42
Copy link
Member Author

msau42 commented Jul 13, 2017

@groundnuty currently the volume deletion is handled by the external provisioner, so if you're not using that, then the cleanup will fail. The idea was that you may want to customize your deletion method (zero out the volume, write random bytes x times, different methods for hdds vs ssds), so you could configure the provisioner to do whatever you wanted. But for the case where you're not using the external provisioner, I can look into providing a default deletion routine that just removes the contents.

@msau42
Copy link
Member Author

msau42 commented Jul 13, 2017

@groundnuty sorry I remember the challenge now. Delete is called by the master controller and not the kubelet, so we are not able to access the local volume to do any cleanup. That is why we had to implement it in the external provisioner, which runs as a daemonset on every node.

@groundnuty
Copy link

@msau42 the idea is very good indeed :) tho in the simplest usecase this feature can work just like hostpath with smart scheduling of a pod to a place where the volume is. Then the same simple reclaim/delete strategies that apply to hostpath and nfs would apply here.

also with a simple setup like this (no external prov.):

kind: Pod
apiVersion: v1
metadata:
  name: task-pv-pod
spec:
  volumes:
    - name: task-pv-storage
      persistentVolumeClaim:
       claimName: example-local-claim
  containers:
    - name: task-pv-container
      image: nginx
      ports:
        - containerPort: 80
          name: "http-server"
      volumeMounts:
      - mountPath: "/var/log/nginx/"
        name: task-pv-storage
---
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: example-local-claim
spec:
  accessModes:
  - ReadWriteMany
  resources:
    requests:
      storage: 10Gi
  storageClassName: local-storage
---
apiVersion: v1
kind: PersistentVolume
metadata:
  name: task-pv-volume
  annotations:
    "volume.alpha.kubernetes.io/node-affinity": '{
        "requiredDuringSchedulingIgnoredDuringExecution": {
            "nodeSelectorTerms": [
                { "matchExpressions": [
                    { "key": "beta.kubernetes.io/os",
                      "operator": "In",
                      "values": ["linux"]
                    }
                ]}
             ]}
          }'
spec:
  capacity:
    storage: 10Gi
  accessModes:
    - ReadWriteMany
  persistentVolumeReclaimPolicy: Delete
  storageClassName: local-storage
  local:
    path: /mnt/storage/k8s_local_volumes/

The command kubectl describe pvc/example-local-claim says:

persistentvolume-controller                     Warning         ProvisioningFailed      storageclass.storage.k8s.io "local-storage" not found

as a consequence (I suppose) the nginx pod is stuck in a ContainerCreating state for exceptionally long time (with the image present on the node) then it fails, get recreated and all is fine, detailed log here.

@groundnuty
Copy link

@msau42 regarding the challenge, if you had time I would be grateful if you could outline what steps one should undertake to make the above use case work

@msau42
Copy link
Member Author

msau42 commented Jul 13, 2017

The simplest way is to use the external provisioner daemonset.

Changing the pv controller to call Delete() on the nodes instead of the master is a big architectural change that only makes sense for local volumes, and no other volume plugin.

@groundnuty
Copy link

groundnuty commented Jul 14, 2017

@msau42 The provisioner is working nicely: I've created one dir on a node and it created a volume for it.

I am trying to use your stateful set example with replica set to 1 (the volumecalimtemplate is convinient), unfortunately despite volume claim, claiming it is bound to a proper volume and volume claiming in turns it's used by a volume claim. I get errors in a stateful set pod that claims that a volume is not bound and it results in a pod being scheduled to a wrong node:
https://nopaste.me/view/834b80d7

Any ideas?

Every 2.0s: kubectl describe pvc/local-vol-local-test-0                                                                                                                                              orzechpro.local: Fri Jul 14 13:28:59 2017

Name:           local-vol-local-test-0
Namespace:      default
StorageClass:   local-storage
Status:         Bound
Volume:         local-pv-34ad63dc
Labels:         app=local-test
Annotations:    pv.kubernetes.io/bind-completed=yes
                pv.kubernetes.io/bound-by-controller=yes
Capacity:       97821984Ki
Access Modes:   RWO
Events:         <none>
apiVersion: v1
kind: PersistentVolume
metadata:
  annotations:
    pv.kubernetes.io/bound-by-controller: "yes"
    pv.kubernetes.io/provisioned-by: local-volume-provisioner-k8s-cyfronet-1-worker10-4007d0ef-660a-11e7-b921-fa163eecdaa1
    volume.alpha.kubernetes.io/node-affinity: '{"requiredDuringSchedulingIgnoredDuringExecution":{"nodeSelectorTerms":[{"matchExpressions":[{"key":"kubernetes.io/hostname","operator":"In","values":["k8s-cyfronet-1-worker10"]}]}]}}'
  creationTimestamp: 2017-07-14T11:17:28Z
  name: local-pv-34ad63dc
  resourceVersion: "13355717"
  selfLink: /api/v1/persistentvolumes/local-pv-34ad63dc
  uid: 05337da2-6886-11e7-8527-fa163eecdaa1
spec:
  accessModes:
  - ReadWriteOnce
  capacity:
    storage: 97821984Ki
  claimRef:
    apiVersion: v1
    kind: PersistentVolumeClaim
    name: local-vol-local-test-0
    namespace: default
    resourceVersion: "13355709"
    uid: 199c7d3c-6886-11e7-8527-fa163eecdaa1
  local:
    path: /mnt/disks/k8s-cyfronet-1-worker10-1
  persistentVolumeReclaimPolicy: Delete
  storageClassName: local-storage
status:
  phase: Bound

@msau42 msau42 deleted the local-storage-plugin branch August 15, 2017 01:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.