-
Notifications
You must be signed in to change notification settings - Fork 40k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UXP Expose volume lstat errors in describe event #24624
Conversation
@kubernetes/sig-storage |
Can one of the admins verify that this patch is reasonable to test? If so, please reply "ok to test". This message may repeat a few times in short succession due to jenkinsci/ghprb-plugin#292. Sorry. Otherwise, if this message is too spammy, please complain to ixdy. |
Can one of the admins verify that this patch is reasonable to test? If so, please reply "ok to test". This message may repeat a few times in short succession due to jenkinsci/ghprb-plugin#292. Sorry. Otherwise, if this message is too spammy, please complain to ixdy. |
1 similar comment
Can one of the admins verify that this patch is reasonable to test? If so, please reply "ok to test". This message may repeat a few times in short succession due to jenkinsci/ghprb-plugin#292. Sorry. Otherwise, if this message is too spammy, please complain to ixdy. |
@swagiaal @pmorie @erinboyd @jeffvance - any thoughts on this one? |
@screeley44 |
glog.Errorf("Could not read directory %s: %v", podVolDir, volumeNameDirsStat[i]) | ||
// failed syscall type error, expose it | ||
glog.Errorf("Failure: Could not read directory %s: %v", podVolDir, volumeNameDirsStat[i]) | ||
return volumes, fmt.Errorf("Failure: Could not read directory: \n Path: %s \n Error: %v \n", volumeKindPath, volumeNameDirsStat[i]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove "Failure"
error messages must start with a lowercase letter
remove the newlines ("error reading directory %q: %v", volumeKindPath, volumeNameDirsStat[i])
@screeley44 so, the motivation for this change is that the mount call doesn't return enough context to provide meaningful feedback? |
@pmorie - yes, so in the test case that I mention above, the syscall will return an error that I think is useful for the user, it will say the "endpoint transport is not connected" - which is a good indication that your endpoints are wrong - you are not able to communicate with the server. So I'm trying to expose as much as possible in the describe events with the long term goal to then take those errors and analyze/massage them with real meaningful messages and possible resolution tips for user experience. But first making sure we expose as much useful error information as possible |
@screeley44 and mount has no info about this? |
@pmorie - mount does not expose the underlying sycall failure - that will be in the logs, but it's not carried up through the code and exposed to user in describe event... if you see the examples above (first comment), what eventually gets exposed is with old behavior is a generic error:
So I noticed that there was some more useful info in the logs from trying to read the disks and I wanted to expose that as well...so new behavior would give this basically:
|
if ok { | ||
ref, errGetRef := api.GetReference(pod) | ||
if errGetRef == nil && ref != nil { | ||
kl.recorder.Event(ref, api.EventTypeWarning, kubecontainer.FailedMountVolume, err.Error()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't the right place to create an event of this type
@screeley44 hmm... why is this not failing during the SetUp call of the glusterfs plugin ? Why do we have to wait until the kubelet is inspecting pods' volumes ? |
@swagiaal - yeah, that is what I was trying to figure out as well, and it does show up in glusterfs.go but the error is hidden in a log file for glusterfs, so I spoke with Paul on Friday and we are going to put this one on the back burner for now and he wants me to try another approach where maybe we can pull the relevant error out of the glusterfs.log and return that instead of waiting for the kubelet to inspect |
+1 to not doing this in kubelet code, but is there nothing we can do to get the mount call to return a good error ? Is there a -v options ? Does the error also not get printed when you do this manually in command line ? Also it looks like the glusterfs call is passing a log-file option. Could that be acting to suppress error output and sending it to the logfile ? If parsing the logfile is our only option then so be it but I would like to avoid that if possible. |
@swagiaal - yep, agree, I will look into it and status here, thanks for all the input |
@screeley44 I think this turned into something else -- can we close this one? |
Automatic merge from submit-queue read gluster log to surface glusterfs plugin errors properly in describe events glusterfs.go does not properly expose errors as all mount errors go to a log file, I propose we read the log file to expose the errors without asking the users to 'go look at this log' This PR does the following: 1. adds a gluster option for log-level=ERROR to remove all noise from log file 2. change log file name and path based on PV + Pod name - so specific per PV and Pod 3. create a utility to read the last two lines of the log file when failure occurs old behavior: ``` 13s 13s 1 {kubelet 127.0.0.1} Warning FailedMount Unable to mount volumes for pod "bb-gluster-pod2_default(34b18c6b-070d-11e6-8e95-52540092b5fb)": glusterfs: mount failed: Mount failed: exit status 1 Mounting arguments: 192.168.234.147:myVol2 /var/lib/kubelet/pods/34b18c6b-070d-11e6-8e95-52540092b5fb/volumes/kubernetes.io~glusterfs/pv-gluster glusterfs [log-file=/var/lib/kubelet/plugins/kubernetes.io/glusterfs/pv-gluster/glusterfs.log] Output: Mount failed. Please check the log file for more details. ``` improved behavior: (updated after suggestions from community) ``` 34m 34m 1 {kubelet 127.0.0.1} Warning FailedMount Unable to mount volumes for pod "bb-multi-pod1_default(e7d7f790-0d4b-11e6-a275-52540092b5fb)": glusterfs: mount failed: Mount failed: exit status 1 Mounting arguments: 192.168.123.222:myVol2 /var/lib/kubelet/pods/e7d7f790-0d4b-11e6-a275-52540092b5fb/volumes/kubernetes.io~glusterfs/pv-gluster2 glusterfs [log-level=ERROR log-file=/var/lib/kubelet/plugins/kubernetes.io/glusterfs/pv-gluster2/bb-multi-pod1-glusterfs.log] Output: Mount failed. Please check the log file for more details. the following error information was pulled from the log to help resolve this issue: [2016-04-28 14:21:29.109697] E [socket.c:2332:socket_connect_finish] 0-glusterfs: connection to 192.168.123.222:24007 failed (Connection timed out) [2016-04-28 14:21:29.109767] E [glusterfsd-mgmt.c:1819:mgmt_rpc_notify] 0-glusterfsd-mgmt: failed to connect with remote-host: 192.168.123.222 (Transport endpoint is not connected) ``` also this PR is alternate approach to : #24624
…delete-43 [4.3] Bug 1808548: UPSTREAM: 88146: remove duplicate pv delete commands Origin-commit: 8dbb9dcd4b5dbb945a3f1c5192d4e0b7f637c7ec
Fixes #23982
If a user passes in a bad endpoint (invalid ip) the error being exposed in the
pod describe
is vague, by exposing a more true error, the user will not have to go searching the logs for the issue, in the example below it tells the user that the endoints file is not valid:Current Behavior:
New Behavior:
Now volumes.go when trying to read the disks will pick up on this error first indicating that the volume can not be read because it's invalid (no connection to the source volume) which will give a clue to the user to check that particular volumes endpoints and make sure the IP is correct and you can actually access the server/volume, etc...: