Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Events for operation_executor to show status of mounts, failed/successful to show in describe events #27778

Merged
merged 1 commit into from
Aug 19, 2016

Conversation

screeley44
Copy link
Contributor

@screeley44 screeley44 commented Jun 21, 2016

Fixes #27590
@saad-ali @pmorie @erinboyd

After talking with @pmorie last week about the above issue, I decided to poke around and see if I could remedy. The refactoring broke my previous UXP merged PR's that correctly showed failed mount errors in the describe events. However, Not sure I implemented correctly, but it tested out and seems to be working, let me know what I missed or if this is not the correct approach.

Events:
  FirstSeen LastSeen    Count   From            SubobjectPath   Type        Reason      Message
  --------- --------    -----   ----            -------------   --------    ------      -------
  2m        2m      1   {default-scheduler }            Normal      Scheduled   Successfully assigned nfs-bb-pod1 to 127.0.0.1
  44s       44s     1   {kubelet 127.0.0.1}         Warning     FailedMount Unable to mount volumes for pod "nfs-bb-pod1_default(a94f64f1-37c9-11e6-9aa5-52540073d346)": timeout expired waiting for volumes to attach/mount for pod "nfs-bb-pod1"/"default". list of unattached/unmounted volumes=[nfsvol]
  44s       44s     1   {kubelet 127.0.0.1}         Warning     FailedSync  Error syncing pod, skipping: timeout expired waiting for volumes to attach/mount for pod "nfs-bb-pod1"/"default". list of unattached/unmounted volumes=[nfsvol]
  38s       38s     1   {kubelet }              Warning     FailedMount Unable to mount volumes for pod "a94f64f1-37c9-11e6-9aa5-52540073d346": Mount failed: exit status 32
Mounting arguments: nfs1.rhs:/opt/data99 /var/lib/kubelet/pods/a94f64f1-37c9-11e6-9aa5-52540073d346/volumes/kubernetes.io~nfs/nfsvol nfs []
Output: mount.nfs: Connection timed out

Resolution hint: Check and make sure the NFS Server exists (ensure that correct IPAddress/Hostname was given) and is available/reachable.
Also make sure firewall ports are open on both client and NFS Server (2049 v4 and 2049, 20048 and 111 for v3).
Use commands telnet <nfs server> <port> and showmount <nfs server> to help test connectivity.

This change is Reviewable

@k8s-bot
Copy link

k8s-bot commented Jun 21, 2016

Can one of the admins verify that this patch is reasonable to test? If so, please reply "ok to test".
(Note: "add to whitelist" is no longer supported. Please update configurations in kubernetes/test-infra/jenkins/job-configs/kubernetes-jenkins-pull instead.)

This message may repeat a few times in short succession due to jenkinsci/ghprb-plugin#292. Sorry.

Otherwise, if this message is too spammy, please complain to ixdy.

3 similar comments
@k8s-bot
Copy link

k8s-bot commented Jun 21, 2016

Can one of the admins verify that this patch is reasonable to test? If so, please reply "ok to test".
(Note: "add to whitelist" is no longer supported. Please update configurations in kubernetes/test-infra/jenkins/job-configs/kubernetes-jenkins-pull instead.)

This message may repeat a few times in short succession due to jenkinsci/ghprb-plugin#292. Sorry.

Otherwise, if this message is too spammy, please complain to ixdy.

@k8s-bot
Copy link

k8s-bot commented Jun 21, 2016

Can one of the admins verify that this patch is reasonable to test? If so, please reply "ok to test".
(Note: "add to whitelist" is no longer supported. Please update configurations in kubernetes/test-infra/jenkins/job-configs/kubernetes-jenkins-pull instead.)

This message may repeat a few times in short succession due to jenkinsci/ghprb-plugin#292. Sorry.

Otherwise, if this message is too spammy, please complain to ixdy.

@k8s-bot
Copy link

k8s-bot commented Jun 21, 2016

Can one of the admins verify that this patch is reasonable to test? If so, please reply "ok to test".
(Note: "add to whitelist" is no longer supported. Please update configurations in kubernetes/test-infra/jenkins/job-configs/kubernetes-jenkins-pull instead.)

This message may repeat a few times in short succession due to jenkinsci/ghprb-plugin#292. Sorry.

Otherwise, if this message is too spammy, please complain to ixdy.

@k8s-github-robot k8s-github-robot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. release-note-label-needed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jun 21, 2016
@saad-ali saad-ali assigned saad-ali and unassigned bprashanth Jun 21, 2016
@saad-ali
Copy link
Member

@k8s-bot ok to test

@saad-ali
Copy link
Member

@screeley44 Thanks for taking a crack at this. Would be really nice to get it in for 1.3.

@@ -100,11 +103,16 @@ type OperationExecutor interface {
func NewOperationExecutor(
kubeClient internalclientset.Interface,
volumePluginMgr *volume.VolumePluginMgr) OperationExecutor {

eventBroadcaster := record.NewBroadcaster()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dependencies should be passed in. In this case, have NewOperationExecutor() take a new parameter for the recorder. Plumb it all the way through from cmd/kube-controller-manager/app/controllermanager.go for attach/detach controller and from kubelet for the kubelet volume manager.

@k8s-github-robot k8s-github-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Jun 21, 2016
@screeley44
Copy link
Contributor Author

@saad-ali - also, I still need to update the existing tests with right # of arg params ... just a reminder

@saad-ali
Copy link
Member

Ack. Let me know when it's ready for review, and I'll do another pass.

@k8s-bot
Copy link

k8s-bot commented Jun 22, 2016

Can one of the admins verify that this patch is reasonable to test? If so, please reply "ok to test".
(Note: "add to whitelist" is no longer supported. Please update configurations in kubernetes/test-infra/jenkins/job-configs/kubernetes-jenkins-pull instead.)

This message may repeat a few times in short succession due to jenkinsci/ghprb-plugin#292. Sorry.

Otherwise, if this message is too spammy, please complain to ixdy.

@k8s-github-robot k8s-github-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jun 22, 2016
@screeley44
Copy link
Contributor Author

@saad-ali - can you begin review of this..I did not add an event for every single point where something is returned or logged, thought that might get too spammy... but I did get the major events (mountVolume, mountDevice, unmountVolume, unmountDevice, detachVolume). if you think I need to add additional let me know. Also not sure I fixed up the *_test.go files associated with this correctly, so check those out as well.

@k8s-github-robot k8s-github-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 22, 2016
@saad-ali saad-ali added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 17, 2016
@k8s-github-robot
Copy link

@k8s-bot test this [submit-queue is verifying that this PR is safe to merge]

@k8s-bot
Copy link

k8s-bot commented Aug 17, 2016

GCE e2e build/test passed for commit 782d7d9.

@saad-ali
Copy link
Member

@k8s-bot unit test this issue: #30462

@k8s-github-robot
Copy link

@k8s-bot test this [submit-queue is verifying that this PR is safe to merge]

@k8s-bot
Copy link

k8s-bot commented Aug 18, 2016

GCE e2e build/test passed for commit 782d7d9.

@jsafrane
Copy link
Member

@k8s-bot unit test this issue: #30228

@jsafrane
Copy link
Member

@k8s-bot test this issue: #30462

@k8s-bot
Copy link

k8s-bot commented Aug 19, 2016

GCE e2e build/test passed for commit 782d7d9.

@k8s-github-robot
Copy link

@k8s-bot test this [submit-queue is verifying that this PR is safe to merge]

@k8s-bot
Copy link

k8s-bot commented Aug 19, 2016

GCE e2e build/test passed for commit 782d7d9.

@jsafrane
Copy link
Member

@k8s-bot test this issue: #30607

@k8s-bot
Copy link

k8s-bot commented Aug 19, 2016

GCE e2e build/test passed for commit 782d7d9.

@k8s-github-robot
Copy link

@k8s-bot test this [submit-queue is verifying that this PR is safe to merge]

@k8s-bot
Copy link

k8s-bot commented Aug 19, 2016

GCE e2e build/test passed for commit 782d7d9.

@k8s-github-robot
Copy link

Automatic merge from submit-queue

@k8s-github-robot k8s-github-robot merged commit 6ce405c into kubernetes:master Aug 19, 2016
@jsafrane
Copy link
Member

Finally!! After several days fighting flakes :-(

@pmorie
Copy link
Member

pmorie commented Aug 22, 2016

Thanks for the patch @screeley44

@fabioy
Copy link
Contributor

fabioy commented Sep 8, 2016

@saad-ali Do you still think this is needed for the 1.3 branch? I'm trying to understand what issue this fixes. Thanks.

@screeley44
Copy link
Contributor Author

@fabioy - this PR fixes/adds the basic infrastructure to expose the true error surfaced from the failed mount so the user doesn't just see a generic meaningless error as shown in the referenced issue above #31768, and then have to go hunt through the logs to find the true root cause of the failure. As @saad-ali also mentioned in that issue this PR is a starting point and can be enhanced as we add more events for other volume functions (attach/detach, etc...). But I think it's pretty important from a UXP and error triage/resolution perspective.

@fabioy fabioy added the cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. label Sep 9, 2016
@fabioy
Copy link
Contributor

fabioy commented Sep 9, 2016

Automated cherrypick failed with conflicts. Please create a cherrypick PR for this. Thanks.

@saad-ali
Copy link
Member

saad-ali commented Sep 9, 2016

I would argue getting this cherry picked is not a high priority. It adds events which helps usability, but is not fixing a bug. With 1.4 around the corner, I'd be ok not getting this cherry picked to 1.3, esp if it requires a non-trivial amount of work. @screeley44 if you'd like to prepare a cherry-pick feel free. If not, let me know and we can remove the cherrypick labels.

@screeley44
Copy link
Contributor Author

@saad-ali - I'm ok with removing the cherrypick and putting this in 1.4

@saad-ali saad-ali removed cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. cherrypick-candidate labels Sep 9, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.