Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

read gluster log to surface glusterfs plugin errors properly in describe events #24808

Merged
merged 1 commit into from
Jun 2, 2016
Merged

Conversation

screeley44
Copy link
Contributor

@screeley44 screeley44 commented Apr 26, 2016

glusterfs.go does not properly expose errors as all mount errors go to a log file, I propose we read the log file to expose the errors without asking the users to 'go look at this log'

This PR does the following:

  1. adds a gluster option for log-level=ERROR to remove all noise from log file
  2. change log file name and path based on PV + Pod name - so specific per PV and Pod
  3. create a utility to read the last two lines of the log file when failure occurs

old behavior:

  13s   13s 1   {kubelet 127.0.0.1}     Warning FailedMount Unable to mount volumes for pod "bb-gluster-pod2_default(34b18c6b-070d-11e6-8e95-52540092b5fb)": glusterfs: mount failed: Mount failed: exit status 1
Mounting arguments: 192.168.234.147:myVol2 /var/lib/kubelet/pods/34b18c6b-070d-11e6-8e95-52540092b5fb/volumes/kubernetes.io~glusterfs/pv-gluster glusterfs [log-file=/var/lib/kubelet/plugins/kubernetes.io/glusterfs/pv-gluster/glusterfs.log]
Output: Mount failed. Please check the log file for more details.

improved behavior: (updated after suggestions from community)

  34m       34m     1   {kubelet 127.0.0.1}         Warning     FailedMount Unable to mount volumes for pod "bb-multi-pod1_default(e7d7f790-0d4b-11e6-a275-52540092b5fb)": glusterfs: mount failed: Mount failed: exit status 1
Mounting arguments: 192.168.123.222:myVol2 /var/lib/kubelet/pods/e7d7f790-0d4b-11e6-a275-52540092b5fb/volumes/kubernetes.io~glusterfs/pv-gluster2 glusterfs [log-level=ERROR log-file=/var/lib/kubelet/plugins/kubernetes.io/glusterfs/pv-gluster2/bb-multi-pod1-glusterfs.log]
Output: Mount failed. Please check the log file for more details.

 the following error information was pulled from the log to help resolve this issue: 
[2016-04-28 14:21:29.109697] E [socket.c:2332:socket_connect_finish] 0-glusterfs: connection to 192.168.123.222:24007 failed (Connection timed out)
[2016-04-28 14:21:29.109767] E [glusterfsd-mgmt.c:1819:mgmt_rpc_notify] 0-glusterfsd-mgmt: failed to connect with remote-host: 192.168.123.222 (Transport endpoint is not connected)

also this PR is alternate approach to : #24624

@k8s-bot
Copy link

k8s-bot commented Apr 26, 2016

Can one of the admins verify that this patch is reasonable to test? If so, please reply "ok to test".
(Note: "add to whitelist" is no longer supported. Please update configurations in hack/jenkins/job-configs/kubernetes-jenkins-pull/ instead.)

This message may repeat a few times in short succession due to jenkinsci/ghprb-plugin#292. Sorry.

Otherwise, if this message is too spammy, please complain to ixdy.

2 similar comments
@k8s-bot
Copy link

k8s-bot commented Apr 26, 2016

Can one of the admins verify that this patch is reasonable to test? If so, please reply "ok to test".
(Note: "add to whitelist" is no longer supported. Please update configurations in hack/jenkins/job-configs/kubernetes-jenkins-pull/ instead.)

This message may repeat a few times in short succession due to jenkinsci/ghprb-plugin#292. Sorry.

Otherwise, if this message is too spammy, please complain to ixdy.

@k8s-bot
Copy link

k8s-bot commented Apr 26, 2016

Can one of the admins verify that this patch is reasonable to test? If so, please reply "ok to test".
(Note: "add to whitelist" is no longer supported. Please update configurations in hack/jenkins/job-configs/kubernetes-jenkins-pull/ instead.)

This message may repeat a few times in short succession due to jenkinsci/ghprb-plugin#292. Sorry.

Otherwise, if this message is too spammy, please complain to ixdy.

@k8s-github-robot k8s-github-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. release-note-label-needed labels Apr 26, 2016
@pmorie
Copy link
Member

pmorie commented Apr 26, 2016

@screeley44 the new message is a lot more helpful -- how do you feel about this vs. your other patch?

@screeley44
Copy link
Contributor Author

@pmorie - I like this approach much better, much cleaner and gives a better error message

// adding log-level ERROR to remove noise
// and more specific log path so each pod has
// it's own log based on PV + Pod
log := path.Join(p, b.pod.Name + "-glusterfs.log")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the log file is now computed in two places, perhaps this should be a property of the mounter

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pmorie - not sure I understand what you mean by 'computed in two places' - can you explain a bit more?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adding Pod.Name could raise the risk of exceeding PATH_MAX limit, see commit 8bbc86d

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rootfs - yeah, I was thinking of those cases, although probably rare, but could have multiple pods using the same PV and would have the log clobbered and maybe even hard to pin point specific errors - so was thinking of a way to better isolate the log but still have the user be able to easily find it. Maybe it's over kill? The pod name can not be longer than 253 characters, so what if I substring'd the pod if it was over X length to first 32 characters?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, this could be rare, let keep it that way then

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@pmorie
Copy link
Member

pmorie commented Apr 27, 2016

@kubernetes/sig-storage

// it all goes in a log file, we will read the log file
logerror := readGlusterLog(log, b.pod.Name)
if logerror != nil {
glog.Errorf("glusterfs: mount failed: %v", logerror)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't need to double logging - error is propagated to caller

@ghost
Copy link

ghost commented Apr 27, 2016

Hmm.. this log parsing business makes me very uncomfortable. Have we really exhausted all options ? If so I think we should file a bug (or post a patch ?) against the appropriate glusterfs util.

@screeley44
Copy link
Contributor Author

@swagiaal - yes, from what I can tell, per current functionality and options available, there are no valid parameters to pass in to the gluster mount to get back the errors, the default behavior is to send to log files, you either specify the log file as a parameter or it creates a default one for you.

// Since gluster does not return eror text
// it all goes in a log file, we will read the log file
logerror := readGlusterLog(log, b.pod.Name)
if logerror != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if no /sbin/mount.glusterfs is on your node, mount will fail (as reported by errs) and readGlusterLog will return an error (failed to open log file), but since errs is not returned, the real cause is not reported .

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point...added back in @rootfs and upated new describe event messages above

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a convention in go to re-use an error in a case like this where a list of errors is being accumulated.

Also, I think the gluster log has to be a best-effort attempt here. There are legitimate reasons that this fail might not be present when this fails.

@thockin @saad-ali any thoughts on this?

@screeley44
Copy link
Contributor Author

There is some work being done to add in a --verbose flag - but still unclear at this time if it will be able to send back the error specific log entries to the caller and/or give us what we are looking for

http://review.gluster.org/#/c/11469/ it might be worth while to follow this PR approach until the upstream solution is fully worked out and then changing at that time

@saad-ali
Copy link
Member

http://review.gluster.org/#/c/11469/ it might be worth while to follow this PR approach until the upstream solution is fully worked out and then changing at that time

That would be fine, just make sure to open a tracking issue to remove this code and use the verbose flag once that option is available.

@ghost
Copy link

ghost commented Apr 28, 2016

it might be worth while to follow this PR approach until the upstream solution is fully worked out and then changing at that time

+1

@screeley44 if you are unsure file an issue upstream asking for exactly what you need. If it is covered by that patch they will point that out.

@screeley44
Copy link
Contributor Author

@swagiaal - yep, @rootfs and I have been in conversations with them today and will continue to follow up and create necessary issues when needed. thanks

@ghost
Copy link

ghost commented Apr 29, 2016

That is great.. thanks guys!

// on failure of gluster SetUp and return those so kubelet can
// properly expose them
// return nil on any failure
func readGlusterLog(path string, podName string) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pmorie - I tested with hpcloud/tail and although it did tail the log it was a bit clunky in how it acted with our volumes, and it was pretty limited in overall functionality and seems overkill for what we are attempting to accomplish, and I would argue it's not worth integrating another dependency for what is hopefully a short term fix after mount.glusterfs does what we want. Thoughts?

@screeley44
Copy link
Contributor Author

Added reminder Issue #26367 to revert this PR after new functionality is added to mount.glusterfs call

logerror := readGlusterLog(log, b.pod.Name)
if logerror != nil {
// return fmt.Errorf("glusterfs: mount failed: %v", logerror)
return fmt.Errorf("glusterfs: mount failed: %v the following error information was pulled from the log to help resolve this issue: %v", errs, logerror)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clarify this is the glusterfs log and i will tag this LGTM.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pmorie - does that look better?

@pmorie
Copy link
Member

pmorie commented May 27, 2016

@screeley44 one last nit and I will tag. Sorry for the delay; I think this is a great patch.

@pmorie pmorie added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 31, 2016
@pmorie
Copy link
Member

pmorie commented May 31, 2016

@k8s-bot test this issue: #IGNORE

@pmorie
Copy link
Member

pmorie commented May 31, 2016

I couldn't determine the root cause of the failure and couldn't find a matching flake.

@screeley44
Copy link
Contributor Author

@k8s-bot test this issue: #25322

1 similar comment
@pmorie
Copy link
Member

pmorie commented Jun 1, 2016

@k8s-bot test this issue: #25322

@pmorie pmorie added this to the v1.3 milestone Jun 1, 2016
@pmorie
Copy link
Member

pmorie commented Jun 1, 2016

@k8s-bot test this issue: #26644

@jayunit100
Copy link
Member

@screeley44 need to gofmt

SUCCESS  ./hack/../hack/verify-godeps.sh    0s
Verifying ./hack/../hack/verify-gofmt.sh
!!! 'gofmt -s -w' needs to be run on the following files: 
./pkg/volume/glusterfs/glusterfs.go
./pkg/volume/glusterfs/glusterfs_util.go

@jayunit100
Copy link
Member

@screeley44 (i.e. the build log above http://pr-test.k8s.io/24808/kubernetes-pull-build-test-e2e-gce/42856/build-log.txt ) see the build log buttons

@k8s-github-robot k8s-github-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 1, 2016
@screeley44
Copy link
Contributor Author

@pmorie can you tag again, I ran gofmt and had to change a few things

@pmorie pmorie added lgtm "Looks good to me", indicates that a PR is ready to be merged. and removed lgtm "Looks good to me", indicates that a PR is ready to be merged. labels Jun 2, 2016
@pmorie
Copy link
Member

pmorie commented Jun 2, 2016

@screeley44 verify-boilerplate failed:

FAILED   ./hack/../hack/verify-boilerplate.sh   0s

@pmorie pmorie added lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-merge and removed sig/storage Categorizes an issue or PR as relevant to SIG Storage. labels Jun 2, 2016
@k8s-github-robot
Copy link

@k8s-bot test this [submit-queue is verifying that this PR is safe to merge]

@k8s-bot
Copy link

k8s-bot commented Jun 2, 2016

GCE e2e build/test passed for commit a36cd3d.

@k8s-github-robot
Copy link

Automatic merge from submit-queue

@k8s-github-robot k8s-github-robot merged commit 0b7f8e5 into kubernetes:master Jun 2, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants