-
Notifications
You must be signed in to change notification settings - Fork 40k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ignore non socket files in the kubelet plugin watcher #70494
Conversation
16a65aa
to
790f1a1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR, have two comments as below.
if !fi.IsDir() { | ||
if !strings.HasSuffix(fi.Name(), ".sock") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of checking name suffix, we can check file mode to make sure it is socket file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think suffix checking is still needed, can we do both?
suffix checking is necessary for DELETE event, so that we can skip those DELETE events that come from non-socket file. (unfortunately, we can not check file mode since already been deleted when received the event), could you update the handleDeleteEvent
?
relative += "/" | ||
} | ||
|
||
if strings.Count(relative, "/") > 1 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
by reading the PR, i believe filtering socket file is good enough to reduce the actual registration attempt, we will still see logs saying: non-socket files been ignored, we could increase log verbose level if think too many logs is a issue, similarly i suggest to increase log verbose level at handleDeleteEvent()
limit directory depth where plugin can stay has two drawbacks IMO:
- it is hard to enforce, not sure everyone will read code/readme here.
- it provides less flexibility to plugins: if they want to organize many plugins with directory structure .
Thanks
@RenaudWasTaken there are couple of suggestions from the issue (#69015) that should help.
While the above will reject more DIRs and avoid scanning them, another suggestion would be to start scanning at a DIR dir one level below the Kubelet dir (i.e. Thoughts ? |
Here is my take on the three possible solution:
I believe solution 3 makes the most sense, but would like to have support from others :) |
@RenaudWasTaken what do you mean by Stop scanning?
|
By that I mean that any directory created under |
@RenaudWasTaken for 3 scanning would always happen at $KUBELET_DIR/ but skip $KUBELET_DIR/foo.io and use the mounter function to skip mounted inline plugin dirs ? |
@RenaudWasTaken Let's go with Option 3 along with the mounter suggestion. I think that would work. |
i don't understand why option 1 will cause more CPU load, |
@figo I think @RenaudWasTaken was referring to unnecessary processing required to trigger the file event. The trigger will happen before the mounted status can be determined. However, it will stop further walk down the directory if it is mounted. At this point, the mounted check is a good solution. Maybe it can be combined with option 3? |
Reading a bit more your comment, it seems I lack understanding of what the CSI plugins does. Are you suggesting that most directories in the plugin tree are mounted directories? Looking at the logs posted on the original issue, I expected that most of the metadata files (and dirs) weren't mount points:
Because of that I was referring to increased CPU load as, when you look at fsnotify's (and inotify's) implementation more files and directories watched means more work for the CPU / kernel to do (given the current behavior, I can clearly see this grow to hundreds of files or more), and more fds used. Do you mind detailing a bit where the mountpoint is? |
Unfortunately there is a mix of mounted points and metadata file nodes in $KUBELET_DIR/plugins. Here is another set of suggestions that I think could fix things ever more when scanning $KUBELET_DIR/plugins to keep backward compat with existing drivers:
This should avoid hitting all plugins (inline and csi) thus reducing scanning overhead. Thoughts. |
/milestone v1.13 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/approve
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: RenaudWasTaken, saad-ali, vishh The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/priority critical-urgent |
/retest |
/retest
…On Fri, Nov 16, 2018, 17:58 k8s-ci-robot ***@***.***> wrote:
@RenaudWasTaken <https://github.com/RenaudWasTaken>: The following tests
*failed*, say /retest to rerun them all:
Test name Commit Details Rerun command
pull-kubernetes-local-e2e-containerized 420378b
<420378b>
link
<https://gubernator.k8s.io/build/kubernetes-jenkins/pr-logs/pull/70494/pull-kubernetes-local-e2e-containerized/3539/> /test
pull-kubernetes-local-e2e-containerized
pull-kubernetes-e2e-gce-100-performance 1a3fbf1
<1a3fbf1>
link
<https://gubernator.k8s.io/build/kubernetes-jenkins/pr-logs/pull/70494/pull-kubernetes-e2e-gce-100-performance/27559/> /test
pull-kubernetes-e2e-gce-100-performance
pull-kubernetes-kubemark-e2e-gce-big 1a3fbf1
<1a3fbf1>
link
<https://gubernator.k8s.io/build/kubernetes-jenkins/pr-logs/pull/70494/pull-kubernetes-kubemark-e2e-gce-big/29936/> /test
pull-kubernetes-kubemark-e2e-gce-big
pull-kubernetes-integration 1a3fbf1
<1a3fbf1>
link
<https://gubernator.k8s.io/build/kubernetes-jenkins/pr-logs/pull/70494/pull-kubernetes-integration/36118/> /test
pull-kubernetes-integration
Full PR test history <https://gubernator.k8s.io/pr/70494>. Your PR
dashboard <https://gubernator.k8s.io/pr/RenaudWasTaken>. Please help us
cut down on flakes by linking to
<https://git.k8s.io/community/contributors/devel/flaky-tests.md#filing-issues-for-flaky-tests>
an open issue
<https://github.com/kubernetes/kubernetes/issues?q=is:issue+is:open> when
you hit one in your PR.
Instructions for interacting with me using PR comments are available here
<https://git.k8s.io/community/contributors/guide/pull-requests.md>. If
you have questions or suggestions related to my behavior, please file an
issue against the kubernetes/test-infra
<https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:>
repository. I understand the commands that are listed here
<https://go.k8s.io/bot-commands>.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#70494 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ACZyE9a20Irqp5PuB5db_etDsBJdu5-9ks5uv21FgaJpZM4YEhWr>
.
|
/hold Double checking GCE PD tests |
Ok GCE PD CSI tests look good. Removing hold /hold cancel /test pull-kubernetes-e2e-gce-100-performance |
The registration directory has changed from `{kubelet_root_dir}/plugins/` to `{kubelet_root_dir}/plugins_registry/`. see: kubernetes/kubernetes#70494
We recently upgraded our kubernetes cluster from
The node has plenty of free space, so we took our efforts of tracing the problem to watchers. We noticed that there was an excessive use of them:
and increasing the maximum number
) |
For 1.12, we should backport the change that blacklisted the kubernetes.io subdirectory. |
@msau42 unfortunately that change is breaking, backporting it would require all the plugins to backport their updates. I don't really think that's possible... |
A solution could be to limit the search depth for 1.12, what do you think @msau42 @vladimirvivien @saad-ali |
Limiting the search depth is technically breaking too, but unlikely. I don't know of any users of plugin registration other than csi, and at least our csi documentation does not suggest a kubernetes.io subdirectory |
I agree that we can't backport this PR, which changes the plugin dir, to 1.12. But we have to fix this for 1.12. So, let's back port the fix we made for this in #71314 -- i.e. blacklist the @RenaudWasTaken can you prepare a PR? |
I can do that, I should be able to take care of that today. |
Great, thanks!! |
I'm encountering the issue reported here by @ffilippopoulos, but on v1.13.2, and with the |
@vreon can you open a new issue and provide some logs? |
The registration directory has changed from `{kubelet_root_dir}/plugins/` to `{kubelet_root_dir}/plugins_registry/`. see: kubernetes/kubernetes#70494
What type of PR is this?
/kind bug
/sig node
What this PR does / why we need it: Makes sure the pluginwatcher ignores non socket files.
Which issue(s) this PR fixes : Fixes #69015
Special notes for your reviewer:
Will post a list of plugins using the pluginwatcher mechanism and where they place the socket.
cc @vladimirvivien @vikaschoudhary16 @figo
Does this PR introduce a user-facing change?: