-
-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
file: Add Shortcut metadata parsing on Windows #8143
file: Add Shortcut metadata parsing on Windows #8143
Conversation
9381378
to
0ebf467
Compare
- Add 6 new columns to the file table on Windows, to display Shortcut metadata (.lnk files), and specifically the shortcut_target_path, shortcut_target_type, shortcut_target_location, shortcut_start_in, shortcut_run, shortcut_comment columns. - Fix a small bug in the file integration test, where a comma was forgotten, and instead of creating and testing querying two files, the concatentation of both was tested - Added logic to the integration test to create shortcuts to the created files, and test their content. - Fix the expandConstraints function so that it can be const, since it's not supposed to modify the context.
0ebf467
to
a40a5a0
Compare
} | ||
|
||
SHFILEINFO file_info{}; | ||
auto res = SHGetFileInfoW(buffer, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should call this function from a background thread. Failure to do so could cause the UI to stop responding.
It sounds like this is a blocking call. Does this pose any kind of problem for osquery?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I normally regard basically any API call as blocking unless they expose an option to be async.
Now the warning is there since probably it's more common to happen, but I would say that it highly depends on what exactly is the combination that causes hanging (and is it indefinite, or because it's slow?).
I mean I expect that, as someone mentions on the internet, if you're trying to access a file that's on a networked share, then the access could be slow, and I feel that's mentioned in the docs because it's normally used by UI applications which might not be aware of the fact that the call might take a long time to return.
A deadlock might still be possible in some cases (although I'm not 100% sure on Windows) with NFS for instance, because if you mount with the hard
option, and the server providing the mount goes away, if you try to access a file, the operation will get stuck indefinitely. Or at least until the server goes back up/ the mount is forcefully removed.
I'm not sure if there's any other hazard, but I expect the API to read the file on the filesystem and parse it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, most calls are technically blocking, what I meant is, this one appears to be known to sometimes cause a delay. You've already guessed the same reasons I would: network filesystems or shortcuts to content on unmounted volumes. The documentation isn't clear on whether the delay is finite or not, but I would guess there's a timeout. Even a short timeout could amount to a lot if it's on every call and the call happens many times.
My question is pretty open-ended, I just wanted to note the potential problem and let someone else decide whether to accept it or address it in some way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically I could delay and batch those requests on a thread that's then timed out from the main thread (and terminated if it doesn't collaborate on cleanly exiting), although I realized that right now we have a problem before that, reading from network shares in general. The readFile
function right now is unable to properly read from them.
So I think this would most often than not fail on those.
The issue is that on Windows, but also on Linux/macOS, we are currently reading in non blocking mode to prevent blocking if data isn't there yet, particularly if we are trying to read on a file that might never get data (like a FIFO or pipe).
But there's also the case where data isn't ready yet (due to network latency), but it's definitely there. In that case the read always immediately errors out asking to try again, which is though the same error as the previous case. So we are forced to conflate the two cases, which result in immediately failing the read, and so most of the time we are unable to read from shares.
This issue a bit orthogonal at this point but to fix this we would need to use a select/poll mechanism on Unix platforms, and on Windows we can use the APIs we are already using, but retrying once in a while and keeping into account how much time as passed.
Additionally, what makes it a little more complex is choosing the timeout value that works correctly for all the file sizes or read sizes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the case where data isn't ready yet (due to network latency), but it's definitely there. In that case the read always immediately errors out asking to try again, which is though the same error as the previous case. So we are forced to conflate the two cases, which result in immediately failing the read, and so most of the time we are unable to read from shares.
Why must we conflate these two cases? Are you saying that the returned error is the same generic error, and there are not two error codes like (for example) ERROR_NODATA and ERROR_NETWORK_NOT_READY?
I think probably the fail to read is better than the chance of stalling (even though there is the osquery watchdog to kick in, even if a stall did happen), so perhaps there is no real threat at the moment. The "proper" solution you're describing does sound complex and maybe it just becomes technical debt for later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why must we conflate these two cases? Are you saying that the returned error is the same generic error, and there are not two error codes like (for example) ERROR_NODATA and ERROR_NETWORK_NOT_READY?
To be fair I have to check what the Windows overlapped read API can return, but on POSIX there would be no difference, the O_NONBLOCK mode of read would return EAGAIN, which causes the read logic to exit immediately.
What you would need to do is a select
or poll
with a variable timeout, again depending on the minimum speed you expect (and probably even a global timeout too).
That being said even if the readFile
could fail if the share has issues, "saving" us from locking later, the issues could be spotty, and the readFile
could succeed and we could have a slow read when the SHGetFileInfoW
API is called.
I think what I was trying to say is that issues like this (either access to the filesystem failing when it shouldn't or possibly hanging), are pretty diffused in osquery right now, and I would say even ignored.
I can change how the SHGetFileInfoW
is called to bring us forward in the process.
I think probably the fail to read is better than the chance of stalling (even though there is the osquery watchdog to kick in, even if a stall did happen), so perhaps there is no real threat at the moment. [...]
Unfortunately no, the watchdog doesn't know what osquery is doing, so stalling is not handled.
EDIT: I was forgetting that while the SHGetFileInfoW
documentation raises that warning, I expect the previous stats, the call to file->Load
, the handling of the path
constraint, to all have the ability to hang osquery for the same reasons (access to the filesystem).
Honestly, unless there's something else about SHGetFileInfoW
that I'm not aware of, I don't think this is making things worse.
The "solution" that actually fixes everything is complicated. Either all file accesses in the tables have to be done on a separate thread somehow, with a method though that doesn't kill performance (since the naive way of launching a thread for each access or even just for each table call could be too heavy, especially when the table is called in a JOIN), and/or the watchdog has to know what osquery is doing, but then it also should have a way to know that a table is making progress or not, because otherwise one can only do a global timeout, which might not be always desirable (I know users that are ok for queries taking minutes to complete).
But again it seems a bit out of scope here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm inclined to agree with Stefano that "unless there's something else about SHGetFileInfoW" this seems like an acceptable risk, and the more general problem of network shares seems out of scope for this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some comments/concerns :)
} | ||
|
||
SHFILEINFO file_info{}; | ||
auto res = SHGetFileInfoW(buffer, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm inclined to agree with Stefano that "unless there's something else about SHGetFileInfoW" this seems like an acceptable risk, and the more general problem of network shares seems out of scope for this PR.
IPersistFile* file; | ||
hres = shell_link->QueryInterface(IID_IPersistFile, | ||
reinterpret_cast<LPVOID*>(&file)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think file
needs to be freed with a file->Release()
? Shall we use a scope_guard
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤦 You're correct, thanks!
} | ||
|
||
boost::optional<LnkData> parseLnkData(const fs::path& link) { | ||
IShellLink* shell_link; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does shell_link
need to be freed as well? Judging by the example in https://learn.microsoft.com/en-us/windows/win32/shell/links#resolving-a-shortcut it seems like it may need to be.
osquery/tables/utility/file.cpp
Outdated
return boost::none; | ||
} | ||
|
||
std::uint32_t expected_size = 0x4C; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit confused about this being named _size. Is this an expected header size, or an expected value (which is how I read the actual code)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah the name is unfortunate, the value represents the expected header size which we expect to read as a value in the first bytes of the file header.
I'll add this as a global constant with a clearer name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll also change the memcmp
later to a memcpy
to a new variable + if comparison, just to be more transparent.
I was being too clever here.
0180c69
to
3ea6c29
Compare
3ea6c29
to
97de777
Compare
This should've been removed with the old Shortcuts table
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM now, thank you!
- Add 6 new columns to the file table on Windows, to display Shortcut metadata (.lnk files), and specifically the shortcut_target_path, shortcut_target_type, shortcut_target_location, shortcut_start_in, shortcut_run, shortcut_comment columns. - Fix a small bug in the file integration test, where a comma was forgotten, and instead of creating and testing querying two files, the concatentation of both was tested - Added logic to the integration test to create shortcuts to the created files, and test their content. - Fix the expandConstraints function so that it can be const, since it's not supposed to modify the context.
Add 6 new columns to the file table on Windows,
to display Shortcut metadata (.lnk files),
and specifically the shortcut_target_path,
shortcut_target_type, shortcut_target_location,
shortcut_start_in, shortcut_run, shortcut_comment
columns.
Fix a small bug in the file integration test, where a comma was forgotten,
and instead of creating and testing querying two files, the concatentation
of both was tested
Added logic to the integration test to create shortcuts to the created files,
and test their content.
Fix the expandConstraints function so that it can be const,
since it's not supposed to modify the context.
I removed the dependency on unicode: Improve Windows unicode conversion functions #8109, although it would be an improvement.