You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
UXP: kubelet volume mount errors are not surfacing properly for true root cause in 'kubectl describe' leading to confusing generic error message on failed mounts
#22992
Closed
screeley44 opened this issue
Mar 15, 2016
· 4 comments
· Fixed by #23122
fyi @smarterclayton@pweil@pmorie
When using PV/PVC for Persistent Volume volume mounts, if something goes wrong in the builder, a generic error is surfaced in the 'kubectl describe' with the following event error description *unsupported volume type":
Events:
FirstSeen LastSeen Count From SubobjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
14s 14s 1 {default-scheduler } Normal Scheduled Successfully assigned bb-gluster-pod2 to 127.0.0.1
14s 14s 1 {kubelet 127.0.0.1} Warning FailedMount Unable to mount volumes for pod "bb-gluster-pod2_default(3ff651a5-eabf-11e5-b3b1-52540092b5fb)": unsupported volume type
14s 14s 1 {kubelet 127.0.0.1} Warning FailedSync Error syncing pod, skipping: unsupported volume type
This only occurs when using the PV/PVC abstraction and not directly using a volume plugin. These type of errors are common when using volume plugins for mounts and would be useful for users to have an easy way to identify the errors rather than searching the logs for root cause.
I think there are two issues:
Why is the original error surfaced from the volume plugins (glusterfs.go, rbd.go, etc...) getting eaten when they return back to the volumes.go? I can see in the logs that 2 volume types are processed for any pod - the first is the actual underlying type (i.e. kubernetes.io/glusterfs) and this is where the actual real error is surfaced, then immediately after this the 2nd type is processed which is the general persistent claim type (i.e. kubernetes.io/persistent-claim) and this when the first type error is eaten
A simple work around would be to simply add a recorder event in the volumes.go when there is a builder failure. This ensures the error is recorded and displayed as an event for the describer and will handle all like errors from other volume plugin types.
full modifed function for reference from volumes.go
func (kl *Kubelet) newVolumeBuilderFromPlugins(spec *volume.Spec, pod *api.Pod, opts volume.VolumeOptions) (volume.Builder, error) {
plugin, err := kl.volumePluginMgr.FindPluginBySpec(spec)
if err != nil {
return nil, fmt.Errorf("can't use volume plugins for %s: %v", spec.Name(), err)
}
if plugin == nil {
// Not found but not an error
return nil, nil
}
builder, err := plugin.NewBuilder(spec, pod, opts)
if err != nil {
// Add kubelet recorder event so the real error shows in the 'kubectl describe <pod>' event list
ref, errGetRef := api.GetReference(pod)
if errGetRef == nil && ref != nil {
kl.recorder.Eventf(ref, api.EventTypeWarning, kubecontainer.FailedMountVolume, "Unable to mount volumes for pod %q: %v", format.Pod(pod), err)
glog.Errorf("Unable to mount volumes for pod %q: %v; skipping pod", format.Pod(pod), err)
}
return nil, fmt.Errorf("failed to instantiate volume plugin for %s: %v", spec.Name(), err)
}
glog.V(10).Infof("Used volume plugin %q to mount %s", plugin.Name(), spec.Name())
return builder, nil
}
This will produce a better user experience when trying to see what went wrong - in this example, I am doing glusterfs PV/PVC but am missing end points which you can clearly see now in the events of the describe:
Events:
FirstSeen LastSeen Count From SubobjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
12s 12s 1 {default-scheduler } Normal Scheduled Successfully assigned bb-gluster-pod2 to 127.0.0.1
12s 12s 1 {kubelet 127.0.0.1} Warning FailedMount Unable to mount volumes for pod "bb-gluster-pod2_default(2b10a2c3-eac7-11e5-ac70-52540092b5fb)": endpoints "glusterfs-cluster" not found
12s 12s 1 {kubelet 127.0.0.1} Warning FailedMount Unable to mount volumes for pod "bb-gluster-pod2_default(2b10a2c3-eac7-11e5-ac70-52540092b5fb)": unsupported volume type
12s 12s 1 {kubelet 127.0.0.1} Warning FailedSync Error syncing pod, skipping: unsupported volume type
rbd example: (Missing or Couldn't get secret)
Events:
FirstSeen LastSeen Count From SubobjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
12s 12s 1 {default-scheduler } Normal Scheduled Successfully assigned bb-gluster-pod2 to 127.0.0.1
12s 12s 1 {kubelet 127.0.0.1} Warning FailedMount Unable to mount volumes for pod "bb-rbd-pod2_default(2b10a2c3-eac7-11e5-ac70-52540092b5fb)": Couldn't get secret
12s 12s 1 {kubelet 127.0.0.1} Warning FailedMount Unable to mount volumes for pod "bb-rbd-pod2_default(2b10a2c3-eac7-11e5-ac70-52540092b5fb)": unsupported volume type
12s 12s 1 {kubelet 127.0.0.1} Warning FailedSync Error syncing pod, skipping: unsupported volume type
The text was updated successfully, but these errors were encountered:
@screeley44 another one is when gluster endpoints are missing, we get 'unsupported volume type' @pmorie with your work extrapolating the permissions for NFS to automatically set supplemental groups, is there also a way to indicate more properly it's a permissions issue for it not mounting?
@erinboyd - the above discussion highlights the missing endpoints as the first example, it also discusses a similar rbd issue (missing secret file) but the point is, any of those type of volume plugin type of errors would be surfaced by adding the event in the volumes.go
fyi @smarterclayton @pweil @pmorie
When using PV/PVC for Persistent Volume volume mounts, if something goes wrong in the builder, a generic error is surfaced in the 'kubectl describe' with the following event error description *unsupported volume type":
This only occurs when using the PV/PVC abstraction and not directly using a volume plugin. These type of errors are common when using volume plugins for mounts and would be useful for users to have an easy way to identify the errors rather than searching the logs for root cause.
I think there are two issues:
full modifed function for reference from volumes.go
This will produce a better user experience when trying to see what went wrong - in this example, I am doing glusterfs PV/PVC but am missing end points which you can clearly see now in the events of the describe:
rbd example: (Missing or Couldn't get secret)
The text was updated successfully, but these errors were encountered: