Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UXP: kubelet volume mount errors are not surfacing properly for true root cause in 'kubectl describe' leading to confusing generic error message on failed mounts #22992

Closed
screeley44 opened this issue Mar 15, 2016 · 4 comments · Fixed by #23122

Comments

@screeley44
Copy link
Contributor

fyi @smarterclayton @pweil @pmorie
When using PV/PVC for Persistent Volume volume mounts, if something goes wrong in the builder, a generic error is surfaced in the 'kubectl describe' with the following event error description *unsupported volume type":

Events:
  FirstSeen LastSeen    Count   From            SubobjectPath   Type        Reason      Message
  --------- --------    -----   ----            -------------   --------    ------      -------
  14s       14s     1   {default-scheduler }            Normal      Scheduled   Successfully assigned bb-gluster-pod2 to 127.0.0.1
  14s       14s     1   {kubelet 127.0.0.1}         Warning     FailedMount Unable to mount volumes for pod "bb-gluster-pod2_default(3ff651a5-eabf-11e5-b3b1-52540092b5fb)": unsupported volume type
  14s       14s     1   {kubelet 127.0.0.1}         Warning     FailedSync  Error syncing pod, skipping: unsupported volume type

This only occurs when using the PV/PVC abstraction and not directly using a volume plugin. These type of errors are common when using volume plugins for mounts and would be useful for users to have an easy way to identify the errors rather than searching the logs for root cause.

I think there are two issues:

  1. Why is the original error surfaced from the volume plugins (glusterfs.go, rbd.go, etc...) getting eaten when they return back to the volumes.go? I can see in the logs that 2 volume types are processed for any pod - the first is the actual underlying type (i.e. kubernetes.io/glusterfs) and this is where the actual real error is surfaced, then immediately after this the 2nd type is processed which is the general persistent claim type (i.e. kubernetes.io/persistent-claim) and this when the first type error is eaten
  2. A simple work around would be to simply add a recorder event in the volumes.go when there is a builder failure. This ensures the error is recorded and displayed as an event for the describer and will handle all like errors from other volume plugin types.

full modifed function for reference from volumes.go

func (kl *Kubelet) newVolumeBuilderFromPlugins(spec *volume.Spec, pod *api.Pod, opts volume.VolumeOptions) (volume.Builder, error) {
    plugin, err := kl.volumePluginMgr.FindPluginBySpec(spec)
    if err != nil {
        return nil, fmt.Errorf("can't use volume plugins for %s: %v", spec.Name(), err)
    }
    if plugin == nil {
        // Not found but not an error
        return nil, nil
    }
    builder, err := plugin.NewBuilder(spec, pod, opts)
    if err != nil {
                // Add kubelet recorder event so the real error shows in the 'kubectl describe <pod>' event list
        ref, errGetRef := api.GetReference(pod)
        if errGetRef == nil && ref != nil {
            kl.recorder.Eventf(ref, api.EventTypeWarning, kubecontainer.FailedMountVolume, "Unable to mount volumes for pod %q: %v", format.Pod(pod), err)
            glog.Errorf("Unable to mount volumes for pod %q: %v; skipping pod", format.Pod(pod), err)
        }
        return nil, fmt.Errorf("failed to instantiate volume plugin for %s: %v", spec.Name(), err)
    }

    glog.V(10).Infof("Used volume plugin %q to mount %s", plugin.Name(), spec.Name())
    return builder, nil
}

This will produce a better user experience when trying to see what went wrong - in this example, I am doing glusterfs PV/PVC but am missing end points which you can clearly see now in the events of the describe:

Events:
  FirstSeen LastSeen    Count   From            SubobjectPath   Type        Reason      Message
  --------- --------    -----   ----            -------------   --------    ------      -------
  12s       12s     1   {default-scheduler }            Normal      Scheduled   Successfully assigned bb-gluster-pod2 to 127.0.0.1
  12s       12s     1   {kubelet 127.0.0.1}         Warning     FailedMount Unable to mount volumes for pod "bb-gluster-pod2_default(2b10a2c3-eac7-11e5-ac70-52540092b5fb)": endpoints "glusterfs-cluster" not found
  12s       12s     1   {kubelet 127.0.0.1}         Warning     FailedMount Unable to mount volumes for pod "bb-gluster-pod2_default(2b10a2c3-eac7-11e5-ac70-52540092b5fb)": unsupported volume type
  12s       12s     1   {kubelet 127.0.0.1}         Warning     FailedSync  Error syncing pod, skipping: unsupported volume type

rbd example: (Missing or Couldn't get secret)

Events:
  FirstSeen LastSeen    Count   From            SubobjectPath   Type        Reason      Message
  --------- --------    -----   ----            -------------   --------    ------      -------
  12s       12s     1   {default-scheduler }            Normal      Scheduled   Successfully assigned bb-gluster-pod2 to 127.0.0.1
  12s       12s     1   {kubelet 127.0.0.1}         Warning     FailedMount Unable to mount volumes for pod "bb-rbd-pod2_default(2b10a2c3-eac7-11e5-ac70-52540092b5fb)": Couldn't get secret
  12s       12s     1   {kubelet 127.0.0.1}         Warning     FailedMount Unable to mount volumes for pod "bb-rbd-pod2_default(2b10a2c3-eac7-11e5-ac70-52540092b5fb)": unsupported volume type
  12s       12s     1   {kubelet 127.0.0.1}         Warning     FailedSync  Error syncing pod, skipping: unsupported volume type
@pmorie
Copy link
Member

pmorie commented Mar 15, 2016

I haven't had a chance to look at your proposal in detail yet, but I agree this is one of the most confusing things new users hit

@erinboyd
Copy link

@screeley44 another one is when gluster endpoints are missing, we get 'unsupported volume type'
@pmorie with your work extrapolating the permissions for NFS to automatically set supplemental groups, is there also a way to indicate more properly it's a permissions issue for it not mounting?

@screeley44
Copy link
Contributor Author

@erinboyd - the above discussion highlights the missing endpoints as the first example, it also discusses a similar rbd issue (missing secret file) but the point is, any of those type of volume plugin type of errors would be surfaced by adding the event in the volumes.go

@jeffvance
Copy link
Contributor

I like it! Is there a PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants