-
Notifications
You must be signed in to change notification settings - Fork 40k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Return more useful error information when a persistent volume fails to mount #23122
Conversation
Can one of the admins verify that this patch is reasonable to test? (reply "ok to test", or if you trust the user, reply "add to whitelist") This message may repeat a few times in short succession due to jenkinsci/ghprb-plugin#292. Sorry. Otherwise, if this message is too spammy, please complain to ixdy. |
2 similar comments
Can one of the admins verify that this patch is reasonable to test? (reply "ok to test", or if you trust the user, reply "add to whitelist") This message may repeat a few times in short succession due to jenkinsci/ghprb-plugin#292. Sorry. Otherwise, if this message is too spammy, please complain to ixdy. |
Can one of the admins verify that this patch is reasonable to test? (reply "ok to test", or if you trust the user, reply "add to whitelist") This message may repeat a few times in short succession due to jenkinsci/ghprb-plugin#292. Sorry. Otherwise, if this message is too spammy, please complain to ixdy. |
Labelling this PR as size/XS |
@smarterclayton @pmorie @pweil- please review |
@@ -34,7 +35,7 @@ import ( | |||
"k8s.io/kubernetes/pkg/volume" | |||
) | |||
|
|||
var errUnsupportedVolumeType = fmt.Errorf("unsupported volume type") | |||
var errUnsupportedVolumeType = fmt.Errorf("failed volume mount, check previous pod describe events or logs for more details") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@screeley44 @pmorie - I took a look at the usages of this error. It looks like some of the usages (NewWrapperBuilder
, NewWrapperCleaner
) will actually hide the underlying error from the New*
call and some don't (mountExternalVolumes
). Should we go ahead and standardize that too?
I'll defer to folks who are in this code more than me.
There are a few different problems to untangle here:
I believe if we make these changes both the error message in the pod status and the message in the event will have the accurate root cause. How does that sound @screeley44 @pweil-? |
Yes, agree
I like this better. I noticed in the review (#23122 (comment)) that we have to go through extra gyrations because methods can return |
👍 to @pmorie's suggestions |
@screeley44 PR needs rebase |
+1 on @pmorie's comment: #23122 (comment) @screeley44 Are you going to continue with pr with @pmorie's suggestion? |
@dchen1107 @pmorie - Yes, I would like to continue, I've been off for spring break and Easter but back now and plan on picking back up on this |
Labelling this PR as size/M |
@pmorie @pweil- please review what I did, I think this will expose all root cause errors to the user in the describe event as I tried to thoroughly test all conditions, but maybe I missed something or could have done something in a better way.
I also removed the errUnsupportedVolumeType generic error and added a method, I tried to replicate error scenarios where that might get called, but I really couldn't find any, but maybe doesn't hurt to keep around just in case? For this part:
the if plugin == nil (not sure how this would ever happen, I mean, we would either get an error or we would get the plugin...correct?) so not sure if we should remove, leave as is or:
|
@dchen1107 - any thoughts on this so far? |
@pmorie @saad-ali @pweil- @smarterclayton where are we with this? Can we please merge this before ANOTHER rebase? |
@kubernetes/sig-storage |
@@ -34,8 +34,6 @@ import ( | |||
"k8s.io/kubernetes/pkg/volume" | |||
) | |||
|
|||
var errUnsupportedVolumeType = fmt.Errorf("unsupported volume type") | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm.. do we really not need this or can we move it somewhere where we are sure that the volume is not supported ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is removed and replaced with a more generic error createVolumeTypeError - which in theory should never be called from what I can tell, so I was asking if we can remove it all together (errUnsupportedVolumeType was original and I removed in this PR) but replaced with a function to eventually help generate a better error...but as I wondered in the PR I don't know if that error would ever get hit, since if the mounter fails in anyway that error is now surfaced where as before it wasn't...so wonder if we can remove altogether or keep around for some unforeseen scenario?? @swagiaal
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay I am convinced it is not needed after reading the rest of the patch :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should not be needed, and there's no utility to keeping it around 'just in case'. YAGNI
// - an attacher if one exists | ||
// - an error if no plugin was found for the volume | ||
// or the attacher failed to instantiate | ||
// - nil if there is no appropriate attacher for this volume |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is kind of the exception to the whole sentence rule -- this is fine as is re: the specification of return values
… to describe event
The goal of this PR was to surface/expose plugin errors that were previously being eaten (i.e. errors that come from nfs.go, glusterfs.go, etc...). And have them properly show in 'describe pod' so users could more easily see them without having to search the logs. This accomplishes that, and all other errors that traditionally already showed in the 'describe' are not really affected. After this I plan on continuing with additonal PRs and Issues to address other useability concerns in terms of error handling and other types of errors being eaten. Below are a few example outputs from these changes, a newly surfaced plugin generated error that is now surfaced with true root cause and an existing non plugin type error that has always been surfaced and has not changed: Missing endpoints from GlusterFS using PV (previously being eaten):
NFS Connection Timeout using PV (timeout error remains the same as this does not stem from plugins directly):
|
@screeley44 I would change the title of this PR (since it will become an entry in the release notes file) to be something like 'Return more useful error information when a persistent volume fails to mount'. @saad-ali Do you have bandwidth to give this a review and then one of us can tag? |
Will do. |
LGTM |
Thanks for the patch @screeley44. I think this is one of the big usability issues w/ PVs today. |
GCE e2e build/test passed for commit 36970de. |
The author of this PR is not in the whitelist for merge, can one of the admins add the 'ok-to-merge' label? |
+1 |
@k8s-bot test this Tests are more than 48 hours old. Re-running tests. |
GCE e2e build/test failed for commit 36970de. Please reference the list of currently known flakes when examining this failure. If you request a re-test, you must reference the issue describing the flake. |
@pmorie - am I supposed to do something with the failed Jenkins? I have not found the cause of the failure yet |
1 similar comment
GCE e2e build/test passed for commit 36970de. |
Fixes #22992
Fixes #23048 - resolves following issues - UXP issues
volume plugin type errors are not being surfaced to pod describe events properly when a failure occurs with PV/PVC volume building - for example plugin builder fails if secrets or endpoints are missing (rbd or glusterfs). The user only sees a generic error that has incorrect meaning "unsupported volume type"
see issues #22992 and #23048 for more details