read gluster log to surface glusterfs plugin errors properly in describe events #24808

screeley44 · 2016-04-26T17:14:41Z

glusterfs.go does not properly expose errors as all mount errors go to a log file, I propose we read the log file to expose the errors without asking the users to 'go look at this log'

This PR does the following:

adds a gluster option for log-level=ERROR to remove all noise from log file
change log file name and path based on PV + Pod name - so specific per PV and Pod
create a utility to read the last two lines of the log file when failure occurs

old behavior:

  13s   13s 1   {kubelet 127.0.0.1}     Warning FailedMount Unable to mount volumes for pod "bb-gluster-pod2_default(34b18c6b-070d-11e6-8e95-52540092b5fb)": glusterfs: mount failed: Mount failed: exit status 1
Mounting arguments: 192.168.234.147:myVol2 /var/lib/kubelet/pods/34b18c6b-070d-11e6-8e95-52540092b5fb/volumes/kubernetes.io~glusterfs/pv-gluster glusterfs [log-file=/var/lib/kubelet/plugins/kubernetes.io/glusterfs/pv-gluster/glusterfs.log]
Output: Mount failed. Please check the log file for more details.

improved behavior: (updated after suggestions from community)

  34m       34m     1   {kubelet 127.0.0.1}         Warning     FailedMount Unable to mount volumes for pod "bb-multi-pod1_default(e7d7f790-0d4b-11e6-a275-52540092b5fb)": glusterfs: mount failed: Mount failed: exit status 1
Mounting arguments: 192.168.123.222:myVol2 /var/lib/kubelet/pods/e7d7f790-0d4b-11e6-a275-52540092b5fb/volumes/kubernetes.io~glusterfs/pv-gluster2 glusterfs [log-level=ERROR log-file=/var/lib/kubelet/plugins/kubernetes.io/glusterfs/pv-gluster2/bb-multi-pod1-glusterfs.log]
Output: Mount failed. Please check the log file for more details.

 the following error information was pulled from the log to help resolve this issue: 
[2016-04-28 14:21:29.109697] E [socket.c:2332:socket_connect_finish] 0-glusterfs: connection to 192.168.123.222:24007 failed (Connection timed out)
[2016-04-28 14:21:29.109767] E [glusterfsd-mgmt.c:1819:mgmt_rpc_notify] 0-glusterfsd-mgmt: failed to connect with remote-host: 192.168.123.222 (Transport endpoint is not connected)

also this PR is alternate approach to : #24624

k8s-bot · 2016-04-26T17:15:13Z

Can one of the admins verify that this patch is reasonable to test? If so, please reply "ok to test".
(Note: "add to whitelist" is no longer supported. Please update configurations in hack/jenkins/job-configs/kubernetes-jenkins-pull/ instead.)

This message may repeat a few times in short succession due to jenkinsci/ghprb-plugin#292. Sorry.

Otherwise, if this message is too spammy, please complain to ixdy.

k8s-bot · 2016-04-26T17:16:13Z

Can one of the admins verify that this patch is reasonable to test? If so, please reply "ok to test".
(Note: "add to whitelist" is no longer supported. Please update configurations in hack/jenkins/job-configs/kubernetes-jenkins-pull/ instead.)

This message may repeat a few times in short succession due to jenkinsci/ghprb-plugin#292. Sorry.

Otherwise, if this message is too spammy, please complain to ixdy.

k8s-bot · 2016-04-26T17:16:29Z

Can one of the admins verify that this patch is reasonable to test? If so, please reply "ok to test".
(Note: "add to whitelist" is no longer supported. Please update configurations in hack/jenkins/job-configs/kubernetes-jenkins-pull/ instead.)

This message may repeat a few times in short succession due to jenkinsci/ghprb-plugin#292. Sorry.

Otherwise, if this message is too spammy, please complain to ixdy.

pmorie · 2016-04-26T19:25:17Z

@screeley44 the new message is a lot more helpful -- how do you feel about this vs. your other patch?

screeley44 · 2016-04-26T19:27:57Z

@pmorie - I like this approach much better, much cleaner and gives a better error message

pmorie · 2016-04-26T19:40:38Z

pkg/volume/glusterfs/glusterfs.go

+	// adding log-level ERROR to remove noise
+	// and more specific log path so each pod has
+	// it's own log based on PV + Pod
+	log := path.Join(p, b.pod.Name + "-glusterfs.log")


the log file is now computed in two places, perhaps this should be a property of the mounter

@pmorie - not sure I understand what you mean by 'computed in two places' - can you explain a bit more?

adding Pod.Name could raise the risk of exceeding PATH_MAX limit, see commit 8bbc86d

@rootfs - yeah, I was thinking of those cases, although probably rare, but could have multiple pods using the same PV and would have the log clobbered and maybe even hard to pin point specific errors - so was thinking of a way to better isolate the log but still have the user be able to easily find it. Maybe it's over kill? The pod name can not be longer than 253 characters, so what if I substring'd the pod if it was over X length to first 32 characters?

yes, this could be rare, let keep it that way then

pmorie · 2016-04-27T15:24:09Z

@kubernetes/sig-storage

rootfs · 2016-04-27T15:49:21Z

pkg/volume/glusterfs/glusterfs.go

+	// it all goes in a log file, we will read the log file
+	logerror := readGlusterLog(log, b.pod.Name)
+	if logerror != nil {
+		glog.Errorf("glusterfs: mount failed: %v", logerror)


don't need to double logging - error is propagated to caller

ghost · 2016-04-27T18:17:37Z

Hmm.. this log parsing business makes me very uncomfortable. Have we really exhausted all options ? If so I think we should file a bug (or post a patch ?) against the appropriate glusterfs util.

screeley44 · 2016-04-27T18:34:54Z

@swagiaal - yes, from what I can tell, per current functionality and options available, there are no valid parameters to pass in to the gluster mount to get back the errors, the default behavior is to send to log files, you either specify the log file as a parameter or it creates a default one for you.

rootfs · 2016-04-27T19:10:14Z

pkg/volume/glusterfs/glusterfs.go

+	// Since gluster does not return eror text
+	// it all goes in a log file, we will read the log file
+	logerror := readGlusterLog(log, b.pod.Name)
+	if logerror != nil {


if no /sbin/mount.glusterfs is on your node, mount will fail (as reported by errs) and readGlusterLog will return an error (failed to open log file), but since errs is not returned, the real cause is not reported .

good point...added back in @rootfs and upated new describe event messages above

It's a convention in go to re-use an error in a case like this where a list of errors is being accumulated.

Also, I think the gluster log has to be a best-effort attempt here. There are legitimate reasons that this fail might not be present when this fails.

@thockin @saad-ali any thoughts on this?

screeley44 · 2016-04-28T17:15:43Z

There is some work being done to add in a --verbose flag - but still unclear at this time if it will be able to send back the error specific log entries to the caller and/or give us what we are looking for

http://review.gluster.org/#/c/11469/ it might be worth while to follow this PR approach until the upstream solution is fully worked out and then changing at that time

saad-ali · 2016-04-28T20:01:15Z

http://review.gluster.org/#/c/11469/ it might be worth while to follow this PR approach until the upstream solution is fully worked out and then changing at that time

That would be fine, just make sure to open a tracking issue to remove this code and use the verbose flag once that option is available.

ghost · 2016-04-28T20:35:40Z

it might be worth while to follow this PR approach until the upstream solution is fully worked out and then changing at that time

+1

@screeley44 if you are unsure file an issue upstream asking for exactly what you need. If it is covered by that patch they will point that out.

screeley44 · 2016-04-28T21:12:10Z

@swagiaal - yep, @rootfs and I have been in conversations with them today and will continue to follow up and create necessary issues when needed. thanks

ghost · 2016-04-29T16:47:26Z

That is great.. thanks guys!

pmorie · 2016-05-02T14:15:07Z

pkg/volume/glusterfs/glusterfs_util.go

+// on failure of gluster SetUp and return those so kubelet can
+// properly expose them
+// return nil on any failure
+func readGlusterLog(path string, podName string) error {


Check out https://github.com/hpcloud/tail

@pmorie - I tested with hpcloud/tail and although it did tail the log it was a bit clunky in how it acted with our volumes, and it was pretty limited in overall functionality and seems overkill for what we are attempting to accomplish, and I would argue it's not worth integrating another dependency for what is hopefully a short term fix after mount.glusterfs does what we want. Thoughts?

screeley44 · 2016-05-26T17:43:35Z

Added reminder Issue #26367 to revert this PR after new functionality is added to mount.glusterfs call

pmorie · 2016-05-27T02:54:19Z

pkg/volume/glusterfs/glusterfs.go

+	logerror := readGlusterLog(log, b.pod.Name)
+	if logerror != nil {
+		// return fmt.Errorf("glusterfs: mount failed: %v", logerror)
+		return fmt.Errorf("glusterfs: mount failed: %v the following error information was pulled from the log to help resolve this issue: %v", errs, logerror)


Clarify this is the glusterfs log and i will tag this LGTM.

@pmorie - does that look better?

pmorie · 2016-05-27T02:54:43Z

@screeley44 one last nit and I will tag. Sorry for the delay; I think this is a great patch.

pmorie · 2016-05-31T19:53:34Z

@k8s-bot test this issue: #IGNORE

pmorie · 2016-05-31T19:53:49Z

I couldn't determine the root cause of the failure and couldn't find a matching flake.

screeley44 · 2016-05-31T20:46:38Z

@k8s-bot test this issue: #25322

pmorie · 2016-06-01T18:09:15Z

@k8s-bot test this issue: #25322

pmorie · 2016-06-01T18:40:16Z

@k8s-bot test this issue: #26644

jayunit100 · 2016-06-01T20:37:33Z

@screeley44 need to gofmt

SUCCESS  ./hack/../hack/verify-godeps.sh    0s
Verifying ./hack/../hack/verify-gofmt.sh
!!! 'gofmt -s -w' needs to be run on the following files: 
./pkg/volume/glusterfs/glusterfs.go
./pkg/volume/glusterfs/glusterfs_util.go

jayunit100 · 2016-06-01T20:46:21Z

@screeley44 (i.e. the build log above http://pr-test.k8s.io/24808/kubernetes-pull-build-test-e2e-gce/42856/build-log.txt ) see the build log buttons

screeley44 · 2016-06-02T01:19:30Z

@pmorie can you tag again, I ran gofmt and had to change a few things

pmorie · 2016-06-02T01:45:02Z

@screeley44 verify-boilerplate failed:

FAILED   ./hack/../hack/verify-boilerplate.sh   0s

k8s-github-robot · 2016-06-02T19:53:38Z

@k8s-bot test this [submit-queue is verifying that this PR is safe to merge]

k8s-bot · 2016-06-02T20:37:46Z

GCE e2e build/test passed for commit a36cd3d.

k8s-github-robot · 2016-06-02T20:42:53Z

Automatic merge from submit-queue

googlebot added the cla: yes label Apr 26, 2016

k8s-github-robot assigned pmorie Apr 26, 2016

k8s-github-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. release-note-label-needed labels Apr 26, 2016

pmorie reviewed Apr 26, 2016
View reviewed changes

rootfs reviewed Apr 27, 2016
View reviewed changes

pmorie reviewed May 2, 2016
View reviewed changes

screeley44 mentioned this pull request May 19, 2016

UXP Expose volume lstat errors in describe event #24624

Closed

This was referenced May 26, 2016

[WIP] Add resolution tips for volume mount errors via describe events #26366

Closed

Reminder: Revert #24808 when mount.glusterfs enables --verbose error returns functionality is implemented #26367

Closed

pmorie reviewed May 27, 2016
View reviewed changes

pmorie added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 31, 2016

pmorie added this to the v1.3 milestone Jun 1, 2016

k8s-github-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 1, 2016

pmorie added lgtm "Looks good to me", indicates that a PR is ready to be merged. and removed lgtm "Looks good to me", indicates that a PR is ready to be merged. labels Jun 2, 2016

read gluster log to surface glusterfs plugin errors properly

a36cd3d

k8s-github-robot added the needs-ok-to-merge label Jun 2, 2016

pmorie added lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-merge and removed sig/storage Categorizes an issue or PR as relevant to SIG Storage. labels Jun 2, 2016

k8s-github-robot removed the needs-ok-to-merge label Jun 2, 2016

k8s-github-robot merged commit 0b7f8e5 into kubernetes:master Jun 2, 2016

read gluster log to surface glusterfs plugin errors properly in describe events #24808

read gluster log to surface glusterfs plugin errors properly in describe events #24808

Conversation

screeley44 commented Apr 26, 2016 • edited Loading

k8s-bot commented Apr 26, 2016

k8s-bot commented Apr 26, 2016

k8s-bot commented Apr 26, 2016

pmorie commented Apr 26, 2016

screeley44 commented Apr 26, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pmorie commented Apr 27, 2016

Choose a reason for hiding this comment

ghost commented Apr 27, 2016

screeley44 commented Apr 27, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

screeley44 commented Apr 28, 2016

saad-ali commented Apr 28, 2016

ghost commented Apr 28, 2016

screeley44 commented Apr 28, 2016

ghost commented Apr 29, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

screeley44 commented May 26, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pmorie commented May 27, 2016

pmorie commented May 31, 2016

pmorie commented May 31, 2016

screeley44 commented May 31, 2016

pmorie commented Jun 1, 2016

pmorie commented Jun 1, 2016

jayunit100 commented Jun 1, 2016

jayunit100 commented Jun 1, 2016

screeley44 commented Jun 2, 2016

pmorie commented Jun 2, 2016

k8s-github-robot commented Jun 2, 2016

k8s-bot commented Jun 2, 2016

k8s-github-robot commented Jun 2, 2016

screeley44 commented Apr 26, 2016 •

edited

Loading