[WIP] Add resolution tips for volume mount errors via describe events #26366

screeley44 · 2016-05-26T17:37:29Z

This is based on UXP and the idea of offering users some tips and hints on how to resolve common mounting errors. I've tossed around several implementation ideas for this but think this is the best approach since it centralizes the logic within the volumes.mountExternalVolumes.

Also there is some additional discussion for this in issue #23982

This also depends (at least for Glusterfs) that PR #24808 merges.

Anyway wanted to start some discussion on this. Below are some examples of the output a user would see based on this PR

Events:
  FirstSeen LastSeen    Count   From            SubobjectPath   Type        Reason      Message
  --------- --------    -----   ----            -------------   --------    ------      -------
  12s       12s     1   {default-scheduler }            Normal      Scheduled   Successfully assigned bb-gluster-pod1 to 127.0.0.1
  12s       2s      2   {kubelet 127.0.0.1}         Warning     FailedMount Unable to mount volumes for pod "bb-gluster-pod1_default(86253c16-2282-11e6-a845-525400495970)": failed to instantiate mounter for volume: glustervol using plugin: kubernetes.io/glusterfs with a root cause: endpoints "glusterfs-cluster" not found
Resolution hint: (glustervol) Make sure the above endpoint exists. To persist endpoints, they should be created as a service.

 14s    14s 1   {kubelet 127.0.0.1}     Warning FailedMount Unable to mount volumes for pod "bb-gluster-pod1_default(86253c16-2282-11e6-a845-525400495970)": glusterfs: mount failed: Mount failed: exit status 32
Mounting arguments: 192.168.122.222:myVol2 /var/lib/kubelet/pods/86253c16-2282-11e6-a845-525400495970/volumes/kubernetes.io~glusterfs/glustervol glusterfs [log-file=/var/lib/kubelet/plugins/kubernetes.io/glusterfs/glustervol/glusterfs.log]
Output: mount: unknown filesystem type 'glusterfs'
Resolution hint: (glustervol) Check and make sure the glusterfs-client package is installed (rpm -qa 'gluster*') on your nodes.
If not, install the client package on your nodes (i.e. yum install glusterfs-client -y).

  6s        5s      2   {kubelet 127.0.0.1}                 Warning     FailedMount Unable to mount volumes for pod "nfs-bb-pod1_default(de44ed68-21ea-11e6-be31-525400495970)": lstat /var/lib/kubelet/pods/de44ed68-21ea-11e6-be31-525400495970/volumes/kubernetes.io~nfs/nfsvol/..: permission denied
Resolution hint: (nfsvol) The pod is running, and the mount succeeded, however the mount is not accessbile due to permissions.  
Check the POSIX based permissions (owner, groups and others) on your mounted directory.  
If needed containers and pods can utilize and pass in a securityContext specifying runAsUser (uid/owner), or additional linux groups such as fsGroup (for block) or SupplementalGroups (for shared).
Work with the storage adminstrator to properly set up access

Events:
  FirstSeen LastSeen    Count   From                            SubobjectPath   Type        Reason      Message
  --------- --------    -----   ----                            -------------   --------    ------      -------
  1m        1m      1   {default-scheduler }                            Normal      Scheduled   Successfully assigned aws-ebs-bb-pod2 to ip-172-30-0-215.us-west-2.compute.internal
  19s       19s     1   {kubelet ip-172-30-0-215.us-west-2.compute.internal}            Warning     FailedMount Unable to mount volumes for pod "aws-ebs-bb-pod2_default(fb68166a-1ea6-11e6-88cc-06155fc6b4db)": Could not attach EBS Disk "vol-5634f7f2": Error attaching EBS volume: InvalidVolume.NotFound: The volume 'vol-5634f7f2' does not exist.
        status code: 400, request id: 
Resolution hint: (ebsvol) Check AWS available volumes for the appropriate availability zone, and make sure the specified volumeID exists and is spelled correctly.

Events:
  FirstSeen LastSeen    Count   From                            SubobjectPath   Type        Reason      Message
  --------- --------    -----   ----                            -------------   --------    ------      -------
  1m        1m      1   {default-scheduler }                            Normal      Scheduled   Successfully assigned aws-ebs-bb-pod1 to ip-172-30-0-113.us-west-2.compute.internal
  52s       52s     1   {kubelet ip-172-30-0-113.us-west-2.compute.internal}            Warning     FailedMount Unable to mount volumes for pod "aws-ebs-bb-pod1_default(1b7c79a2-1ea7-11e6-802a-06cfea9d6949)": Could not attach EBS Disk "vol-b877020a": Error attaching EBS volume: VolumeInUse: vol-b877020a is already attached to an instance
        status code: 400, request id: 
Resolution hint: (ebsvol) The AWS volume is already attached to another instance and only one node per volume is allowed for EBS block devices (can not share across nodes). Another volume will need to be provisioned for use with this pod

if no match is found, then the normal error is returned without any additional hints...

Alternative Approaches could be:

at each error point in the plugin add the resolution hint (nfs.go, aws_ebs.go, etc...). I didn't do this approach because it would result in more code and more files being touched as opposed to catching the error in the centralized mountExternalVolume
rather than keep the logic in code, could externalize into a file/resource that could be added to, edited/customized by admins, seemed like overkill at this point, but might be a good future direction.

@pmorie @erinboyd

k8s-bot · 2016-05-26T17:38:15Z

Can one of the admins verify that this patch is reasonable to test? If so, please reply "ok to test".
(Note: "add to whitelist" is no longer supported. Please update configurations in kubernetes/test-infra/jenkins/job-configs/kubernetes-jenkins-pull instead.)

This message may repeat a few times in short succession due to jenkinsci/ghprb-plugin#292. Sorry.

Otherwise, if this message is too spammy, please complain to ixdy.

k8s-bot · 2016-05-26T17:38:36Z

Can one of the admins verify that this patch is reasonable to test? If so, please reply "ok to test".
(Note: "add to whitelist" is no longer supported. Please update configurations in kubernetes/test-infra/jenkins/job-configs/kubernetes-jenkins-pull instead.)

This message may repeat a few times in short succession due to jenkinsci/ghprb-plugin#292. Sorry.

Otherwise, if this message is too spammy, please complain to ixdy.

k8s-bot · 2016-05-26T17:39:14Z

Can one of the admins verify that this patch is reasonable to test? If so, please reply "ok to test".
(Note: "add to whitelist" is no longer supported. Please update configurations in kubernetes/test-infra/jenkins/job-configs/kubernetes-jenkins-pull instead.)

This message may repeat a few times in short succession due to jenkinsci/ghprb-plugin#292. Sorry.

Otherwise, if this message is too spammy, please complain to ixdy.

pmorie · 2016-05-26T17:40:51Z

@k8s-bot ok to test

pmorie · 2016-05-26T17:41:09Z

cc @kubernetes/sig-storage

pmorie · 2016-05-27T02:59:09Z

pkg/kubelet/volumes_util.go

+// AddMountErrorHint performs some basic analysis
+// on the current mount error returned and will
+// add a user hint or resolution tip for enhanced UXP
+func (kl *Kubelet) AddMountErrorHint(volpath string, volname string, inerr error) error{


Not sure how I feel about this logic being centralized like this. It feels very much like a cross-cut, reading this logic for different volumes all in the same place. I don't think it's so bad to keep the logic at the call site it's relevant to.

For the record I can totally understand the desire to keep this orthogonal from the volume plugins, but I think it's a fine start (and likely to work better in the Kubelet we have today) to keep the logic about possible error causes at the sites where they occur. I think that is the simplest way to start, and maybe a better pattern will become evident.

pmorie · 2016-05-27T03:02:19Z

I like the concept here but I think this can just be additional information at the sites where these errors occur to start with. See #26366 (comment)

screeley44 · 2016-05-31T15:00:45Z

based on comments above going to create a 2nd PR with implementation of logic in each plugin

k8s-github-robot · 2016-06-03T06:38:24Z

@screeley44 PR needs rebase

k8s-bot · 2016-06-14T23:05:27Z

Can one of the admins verify that this patch is reasonable to test? If so, please reply "ok to test".
(Note: "add to whitelist" is no longer supported. Please update configurations in kubernetes/test-infra/jenkins/job-configs/kubernetes-jenkins-pull instead.)

This message may repeat a few times in short succession due to jenkinsci/ghprb-plugin#292. Sorry.

Otherwise, if this message is too spammy, please complain to ixdy.

k8s-bot · 2016-06-19T00:10:17Z

Can one of the admins verify that this patch is reasonable to test? If so, please reply "ok to test".
(Note: "add to whitelist" is no longer supported. Please update configurations in kubernetes/test-infra/jenkins/job-configs/kubernetes-jenkins-pull instead.)

This message may repeat a few times in short succession due to jenkinsci/ghprb-plugin#292. Sorry.

Otherwise, if this message is too spammy, please complain to ixdy.

k8s-bot · 2016-06-23T22:07:29Z

Can one of the admins verify that this patch is reasonable to test? If so, please reply "ok to test".
(Note: "add to whitelist" is no longer supported. Please update configurations in kubernetes/test-infra/jenkins/job-configs/kubernetes-jenkins-pull instead.)

This message may repeat a few times in short succession due to jenkinsci/ghprb-plugin#292. Sorry.

Otherwise, if this message is too spammy, please complain to ixdy.

eparis · 2016-06-24T02:16:42Z

ok to test

k8s-bot · 2016-06-24T03:26:21Z

GCE e2e build/test passed for commit ff03d3f.

k8s-github-robot · 2016-07-24T02:18:32Z

This PR hasn't been active in 30 days. It will be closed in 59 days (Sep 22, 2016).

cc @screeley44 @pmorie

You can add 'keep-open' label to prevent this from happening, or add a comment to keep it open another 90 days

googlebot added the cla: yes label May 26, 2016

Add resolution tips for volume mount errors via describe events

ff03d3f

k8s-github-robot assigned dchen1107 May 26, 2016

k8s-github-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. release-note-label-needed labels May 26, 2016

pmorie added the release-note-none Denotes a PR that doesn't merit a release note. label May 26, 2016

k8s-github-robot removed the release-note-label-needed label May 26, 2016

pmorie reviewed May 27, 2016
View reviewed changes

pmorie added area/volumes sig/storage Categorizes an issue or PR as relevant to SIG Storage. labels May 27, 2016

pmorie assigned pmorie and unassigned dchen1107 May 27, 2016

k8s-github-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 3, 2016

screeley44 closed this Jul 25, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Add resolution tips for volume mount errors via describe events #26366

[WIP] Add resolution tips for volume mount errors via describe events #26366

screeley44 commented May 26, 2016 •

edited

Loading

k8s-bot commented May 26, 2016

k8s-bot commented May 26, 2016

k8s-bot commented May 26, 2016

pmorie commented May 26, 2016

pmorie commented May 26, 2016

pmorie May 27, 2016

pmorie May 27, 2016

pmorie commented May 27, 2016

screeley44 commented May 31, 2016

k8s-github-robot commented Jun 3, 2016

k8s-bot commented Jun 14, 2016

k8s-bot commented Jun 19, 2016

k8s-bot commented Jun 23, 2016

eparis commented Jun 24, 2016

k8s-bot commented Jun 24, 2016

k8s-github-robot commented Jul 24, 2016

[WIP] Add resolution tips for volume mount errors via describe events #26366

[WIP] Add resolution tips for volume mount errors via describe events #26366

Conversation

screeley44 commented May 26, 2016 • edited Loading

k8s-bot commented May 26, 2016

k8s-bot commented May 26, 2016

k8s-bot commented May 26, 2016

pmorie commented May 26, 2016

pmorie commented May 26, 2016

pmorie May 27, 2016

Choose a reason for hiding this comment

pmorie May 27, 2016

Choose a reason for hiding this comment

pmorie commented May 27, 2016

screeley44 commented May 31, 2016

k8s-github-robot commented Jun 3, 2016

k8s-bot commented Jun 14, 2016

k8s-bot commented Jun 19, 2016

k8s-bot commented Jun 23, 2016

eparis commented Jun 24, 2016

k8s-bot commented Jun 24, 2016

k8s-github-robot commented Jul 24, 2016

screeley44 commented May 26, 2016 •

edited

Loading