UXP: kubelet volume mount errors are not surfacing properly for true root cause in 'kubectl describe' leading to confusing generic error message on failed mounts #22992

screeley44 · 2016-03-15T16:22:33Z

fyi @smarterclayton @pweil @pmorie
When using PV/PVC for Persistent Volume volume mounts, if something goes wrong in the builder, a generic error is surfaced in the 'kubectl describe' with the following event error description *unsupported volume type":

Events:
  FirstSeen LastSeen    Count   From            SubobjectPath   Type        Reason      Message
  --------- --------    -----   ----            -------------   --------    ------      -------
  14s       14s     1   {default-scheduler }            Normal      Scheduled   Successfully assigned bb-gluster-pod2 to 127.0.0.1
  14s       14s     1   {kubelet 127.0.0.1}         Warning     FailedMount Unable to mount volumes for pod "bb-gluster-pod2_default(3ff651a5-eabf-11e5-b3b1-52540092b5fb)": unsupported volume type
  14s       14s     1   {kubelet 127.0.0.1}         Warning     FailedSync  Error syncing pod, skipping: unsupported volume type

This only occurs when using the PV/PVC abstraction and not directly using a volume plugin. These type of errors are common when using volume plugins for mounts and would be useful for users to have an easy way to identify the errors rather than searching the logs for root cause.

I think there are two issues:

Why is the original error surfaced from the volume plugins (glusterfs.go, rbd.go, etc...) getting eaten when they return back to the volumes.go? I can see in the logs that 2 volume types are processed for any pod - the first is the actual underlying type (i.e. kubernetes.io/glusterfs) and this is where the actual real error is surfaced, then immediately after this the 2nd type is processed which is the general persistent claim type (i.e. kubernetes.io/persistent-claim) and this when the first type error is eaten
A simple work around would be to simply add a recorder event in the volumes.go when there is a builder failure. This ensures the error is recorded and displayed as an event for the describer and will handle all like errors from other volume plugin types.

full modifed function for reference from volumes.go

func (kl *Kubelet) newVolumeBuilderFromPlugins(spec *volume.Spec, pod *api.Pod, opts volume.VolumeOptions) (volume.Builder, error) {
    plugin, err := kl.volumePluginMgr.FindPluginBySpec(spec)
    if err != nil {
        return nil, fmt.Errorf("can't use volume plugins for %s: %v", spec.Name(), err)
    }
    if plugin == nil {
        // Not found but not an error
        return nil, nil
    }
    builder, err := plugin.NewBuilder(spec, pod, opts)
    if err != nil {
                // Add kubelet recorder event so the real error shows in the 'kubectl describe <pod>' event list
        ref, errGetRef := api.GetReference(pod)
        if errGetRef == nil && ref != nil {
            kl.recorder.Eventf(ref, api.EventTypeWarning, kubecontainer.FailedMountVolume, "Unable to mount volumes for pod %q: %v", format.Pod(pod), err)
            glog.Errorf("Unable to mount volumes for pod %q: %v; skipping pod", format.Pod(pod), err)
        }
        return nil, fmt.Errorf("failed to instantiate volume plugin for %s: %v", spec.Name(), err)
    }

    glog.V(10).Infof("Used volume plugin %q to mount %s", plugin.Name(), spec.Name())
    return builder, nil
}

This will produce a better user experience when trying to see what went wrong - in this example, I am doing glusterfs PV/PVC but am missing end points which you can clearly see now in the events of the describe:

Events:
  FirstSeen LastSeen    Count   From            SubobjectPath   Type        Reason      Message
  --------- --------    -----   ----            -------------   --------    ------      -------
  12s       12s     1   {default-scheduler }            Normal      Scheduled   Successfully assigned bb-gluster-pod2 to 127.0.0.1
  12s       12s     1   {kubelet 127.0.0.1}         Warning     FailedMount Unable to mount volumes for pod "bb-gluster-pod2_default(2b10a2c3-eac7-11e5-ac70-52540092b5fb)": endpoints "glusterfs-cluster" not found
  12s       12s     1   {kubelet 127.0.0.1}         Warning     FailedMount Unable to mount volumes for pod "bb-gluster-pod2_default(2b10a2c3-eac7-11e5-ac70-52540092b5fb)": unsupported volume type
  12s       12s     1   {kubelet 127.0.0.1}         Warning     FailedSync  Error syncing pod, skipping: unsupported volume type

rbd example: (Missing or Couldn't get secret)

Events:
  FirstSeen LastSeen    Count   From            SubobjectPath   Type        Reason      Message
  --------- --------    -----   ----            -------------   --------    ------      -------
  12s       12s     1   {default-scheduler }            Normal      Scheduled   Successfully assigned bb-gluster-pod2 to 127.0.0.1
  12s       12s     1   {kubelet 127.0.0.1}         Warning     FailedMount Unable to mount volumes for pod "bb-rbd-pod2_default(2b10a2c3-eac7-11e5-ac70-52540092b5fb)": Couldn't get secret
  12s       12s     1   {kubelet 127.0.0.1}         Warning     FailedMount Unable to mount volumes for pod "bb-rbd-pod2_default(2b10a2c3-eac7-11e5-ac70-52540092b5fb)": unsupported volume type
  12s       12s     1   {kubelet 127.0.0.1}         Warning     FailedSync  Error syncing pod, skipping: unsupported volume type

The text was updated successfully, but these errors were encountered:

pmorie · 2016-03-15T16:52:52Z

I haven't had a chance to look at your proposal in detail yet, but I agree this is one of the most confusing things new users hit

erinboyd · 2016-03-15T19:25:11Z

@screeley44 another one is when gluster endpoints are missing, we get 'unsupported volume type'
@pmorie with your work extrapolating the permissions for NFS to automatically set supplemental groups, is there also a way to indicate more properly it's a permissions issue for it not mounting?

screeley44 · 2016-03-15T19:34:04Z

@erinboyd - the above discussion highlights the missing endpoints as the first example, it also discusses a similar rbd issue (missing secret file) but the point is, any of those type of volume plugin type of errors would be surfaced by adding the event in the volumes.go

jeffvance · 2016-03-16T00:20:47Z

I like it! Is there a PR?

screeley44 mentioned this issue Mar 15, 2016

UXP: Confusing or Unclear errors in 'oc describe' when mounting volumes - bubble up useful errors as events to be displayed openshift/origin#7902

Closed

pwittrock added the team/cluster label Mar 16, 2016

screeley44 mentioned this issue Mar 17, 2016

Return more useful error information when a persistent volume fails to mount #23122

Merged

rhuss mentioned this issue Apr 4, 2016

Setting PVC attributes in POM - default volume size fabric8io/fabric8#5906

Open

screeley44 mentioned this issue Apr 7, 2016

UXP: glusterfs vague volume mount error and true cause not being exposed to user via describe event #23982

Closed

lavalamp closed this as completed in #23122 Apr 13, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UXP: kubelet volume mount errors are not surfacing properly for true root cause in 'kubectl describe' leading to confusing generic error message on failed mounts #22992

UXP: kubelet volume mount errors are not surfacing properly for true root cause in 'kubectl describe' leading to confusing generic error message on failed mounts #22992

screeley44 commented Mar 15, 2016

pmorie commented Mar 15, 2016

erinboyd commented Mar 15, 2016

screeley44 commented Mar 15, 2016

jeffvance commented Mar 16, 2016

UXP: kubelet volume mount errors are not surfacing properly for true root cause in 'kubectl describe' leading to confusing generic error message on failed mounts #22992

UXP: kubelet volume mount errors are not surfacing properly for true root cause in 'kubectl describe' leading to confusing generic error message on failed mounts #22992

Comments

screeley44 commented Mar 15, 2016

pmorie commented Mar 15, 2016

erinboyd commented Mar 15, 2016

screeley44 commented Mar 15, 2016

jeffvance commented Mar 16, 2016