-
Notifications
You must be signed in to change notification settings - Fork 40.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Local storage plugin #44897
Local storage plugin #44897
Conversation
144afc3
to
2bf82d0
Compare
return spec.PersistentVolume.Spec.LocalVolume, spec.ReadOnly, nil | ||
} | ||
|
||
return nil, false, fmt.Errorf("Spec does not reference an LocalVolume volume type") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
an -> a
} | ||
|
||
// TODO: how to prevent multiple mounts with conflicting fsGroup? | ||
return volume.SetVolumeOwnership(m, fsGroup) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if ! m.readOnly {
return volume.SetVolumeOwnership(m, fsGroup)
}
func (m *localVolumeMounter) GetAttributes() volume.Attributes { | ||
return volume.Attributes{ | ||
ReadOnly: m.readOnly, | ||
Managed: false, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Managed: !m.readOnly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does Managed mean? It's not documented...
return volume.Attributes{ | ||
ReadOnly: m.readOnly, | ||
Managed: false, | ||
SupportsSELinux: false, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why false?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have cycles to test and verify this. Anyone else is welcome to take this up :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In SELinux enabled cluster, we should have this set to true in order to share local volumes even for containers in a pod. This is also required to share volume among different pods.
We already have user manage and provide SELinux labels right now, so setting this to true should suffice.
also commented here #45054
} | ||
} | ||
|
||
// Checks prior to mount operations to verify that the required components (binaries, etc.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CanMount checks ...
localVolume: &localVolume{ | ||
podUID: pod.UID, | ||
volName: spec.Name(), | ||
mounter: plugin.host.GetMounter(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use SafeFormatAndMount
to format the disk if not formatted?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now, the volume specified has to already have a filesystem. We will add block support in the future.
} | ||
|
||
func (plugin *localVolumePlugin) GetAccessModes() []v1.PersistentVolumeAccessMode { | ||
// The current meaning of AccessMode is how many nodes can attach to it, not how many pods can mount it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does kubernetes supports this semantic? I thought ReadWriteOnce
is only used to tell how many pods can mount it.
The current meaning of AccessMode is how many nodes can attach to it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the way that access modes work in the code right now, contrary to what most people assume. It's an issue with all volumes, not just local volumes. But perhaps local volumes will reveal this discrepancy more than other types.
I think there are two approaches to fix this: 1) Change the semantic of current AccessMode, and hopefully nobody was relying on the old behavior. 2) Add a new kind of AccessMode to enforce at the pod level.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't tell which approach is better at this moment. But from a user's perspective, this is definitely a problem if there is such a big misunderstanding. I'll see if I can help when I get there.
}, nil | ||
} | ||
|
||
func (plugin *localVolumePlugin) ConstructVolumeSpec(volumeName, mountPath string) (*volume.Spec, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chakri-nelluri Can you check if this is correct? This volume can only be specified as a PV, so I am using the PV name as the volume's name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, it looks good to me. Since PV Name is unique, it can be used.
2bf82d0
to
e8bec8c
Compare
38ce1de
to
510dc85
Compare
API is in, so I've rebased this PR to pick up those changes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor nits.
pkg/volume/local/local.go
Outdated
} | ||
} | ||
os.Remove(dir) | ||
glog.Errorf("Mount of volume %s failed: %v", dir, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to move this up to line 219, so that the error from m.mounter.Mount is logged for all paths?
pkg/volume/local/local.go
Outdated
glog.Errorf("Failed to unmount: %v", mntErr) | ||
return err | ||
} | ||
notMnt, mntErr := m.mounter.IsLikelyNotMountPoint(dir) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: could use assignment (=
) instead of short variable declaration (:=
) here for notMnt, mntErr
test/e2e/framework/volume_util.go
Outdated
@@ -76,11 +76,18 @@ type VolumeTestConfig struct { | |||
ServerImage string | |||
// Ports to export from the server pod. TCP only. | |||
ServerPorts []int | |||
// Commands to run in the comtainer image. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: s/comtainer/container/
} | ||
|
||
By("Removing the test directory") | ||
writeCmd := fmt.Sprintf("rm -r %s", volume.containerDir) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: maybe rename to removeCmd to match the intent?
@k8s-bot pull-kubernetes-unit test this |
@k8s-bot pull-kubernetes-node-e2e test this |
/lgtm |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: msau42, saad-ali Assign the PR to them by writing
Needs approval from an approver in each of these OWNERS Files:
You can indicate your approval by writing |
@k8s-bot pull-kubernetes-unit test this |
@k8s-bot pull-kubernetes-e2e-gce-etcd3 test this |
1 similar comment
@k8s-bot pull-kubernetes-e2e-gce-etcd3 test this |
Automatic merge from submit-queue (batch tested with PRs 46076, 43879, 44897, 46556, 46654) |
Automatic merge from submit-queue (batch tested with PRs 46076, 43879, 44897, 46556, 46654) Local storage plugin **What this PR does / why we need it**: Volume plugin implementation for local persistent volumes. Scheduler predicate will direct already-bound PVCs to the node that the local PV is at. PVC binding still happens independently. **Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: Part of #43640 **Release note**: ``` Alpha feature: Local volume plugin allows local directories to be created and consumed as a Persistent Volume. These volumes have node affinity and pods will only be scheduled to the node that the volume is at. ```
Hi, When I use the local presistent plugin, have one question, this is my yaml configuration: apiVersion: v1 This configuration can't run success, the error main reason is nodeAffinity, when I delete nodeAffinity and delete validation.go file about nodeAffinity check code and kube_features.go PersistentLocalVolumes false to true , then the configuration is : apiVersion: v1 then success. So I want to ask how to configure the local volume plugin nodeAffinity ? |
thanks @msau42 |
@msau42 I am having a success with the example you linked above and I've integrated it with a simple nginx log persistence:
The small issue is The PV enters the
The desired behavior would be Am I doing something wrong, or is normal in alpha? |
@groundnuty currently the volume deletion is handled by the external provisioner, so if you're not using that, then the cleanup will fail. The idea was that you may want to customize your deletion method (zero out the volume, write random bytes x times, different methods for hdds vs ssds), so you could configure the provisioner to do whatever you wanted. But for the case where you're not using the external provisioner, I can look into providing a default deletion routine that just removes the contents. |
@groundnuty sorry I remember the challenge now. Delete is called by the master controller and not the kubelet, so we are not able to access the local volume to do any cleanup. That is why we had to implement it in the external provisioner, which runs as a daemonset on every node. |
@msau42 the idea is very good indeed :) tho in the simplest usecase this feature can work just like also with a simple setup like this (no external prov.):
The command
as a consequence (I suppose) the nginx pod is stuck in a |
@msau42 regarding the challenge, if you had time I would be grateful if you could outline what steps one should undertake to make the above use case work |
The simplest way is to use the external provisioner daemonset. Changing the pv controller to call Delete() on the nodes instead of the master is a big architectural change that only makes sense for local volumes, and no other volume plugin. |
@msau42 The provisioner is working nicely: I've created one dir on a node and it created a volume for it. I am trying to use your stateful set example with replica set to 1 (the volumecalimtemplate is convinient), unfortunately despite volume claim, claiming it is bound to a proper volume and volume claiming in turns it's used by a volume claim. I get errors in a stateful set pod that claims that a volume is not bound and it results in a pod being scheduled to a wrong node: Any ideas?
|
What this PR does / why we need it:
Volume plugin implementation for local persistent volumes. Scheduler predicate will direct already-bound PVCs to the node that the local PV is at. PVC binding still happens independently.
Which issue this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close that issue when PR gets merged):Part of #43640
Release note: