Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

file: Add Shortcut metadata parsing on Windows #8143

Merged
merged 6 commits into from
Nov 28, 2023

Conversation

Smjert
Copy link
Member

@Smjert Smjert commented Sep 25, 2023

  • Add 6 new columns to the file table on Windows,
    to display Shortcut metadata (.lnk files),
    and specifically the shortcut_target_path,
    shortcut_target_type, shortcut_target_location,
    shortcut_start_in, shortcut_run, shortcut_comment
    columns.

  • Fix a small bug in the file integration test, where a comma was forgotten,
    and instead of creating and testing querying two files, the concatentation
    of both was tested

  • Added logic to the integration test to create shortcuts to the created files,
    and test their content.

  • Fix the expandConstraints function so that it can be const,
    since it's not supposed to modify the context.

    I removed the dependency on unicode: Improve Windows unicode conversion functions #8109, although it would be an improvement.

@Smjert Smjert force-pushed the stefano/feature/windows-parse-lnk branch 3 times, most recently from 9381378 to 0ebf467 Compare September 25, 2023 19:13
@Smjert Smjert added this to the 5.11.0 milestone Oct 4, 2023
- Add 6 new columns to the file table on Windows,
  to display Shortcut metadata (.lnk files),
  and specifically the shortcut_target_path,
  shortcut_target_type, shortcut_target_location,
  shortcut_start_in, shortcut_run, shortcut_comment
  columns.

- Fix a small bug in the file integration test, where a comma was forgotten,
  and instead of creating and testing querying two files, the concatentation
  of both was tested

- Added logic to the integration test to create shortcuts to the created files,
  and test their content.

- Fix the expandConstraints function so that it can be const,
  since it's not supposed to modify the context.
@Smjert Smjert force-pushed the stefano/feature/windows-parse-lnk branch from 0ebf467 to a40a5a0 Compare November 1, 2023 17:51
@Smjert Smjert marked this pull request as ready for review November 1, 2023 17:52
@Smjert Smjert requested review from a team as code owners November 1, 2023 17:52
}

SHFILEINFO file_info{};
auto res = SHGetFileInfoW(buffer,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ShellAPI docs mention:

You should call this function from a background thread. Failure to do so could cause the UI to stop responding.

It sounds like this is a blocking call. Does this pose any kind of problem for osquery?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I normally regard basically any API call as blocking unless they expose an option to be async.
Now the warning is there since probably it's more common to happen, but I would say that it highly depends on what exactly is the combination that causes hanging (and is it indefinite, or because it's slow?).

I mean I expect that, as someone mentions on the internet, if you're trying to access a file that's on a networked share, then the access could be slow, and I feel that's mentioned in the docs because it's normally used by UI applications which might not be aware of the fact that the call might take a long time to return.

A deadlock might still be possible in some cases (although I'm not 100% sure on Windows) with NFS for instance, because if you mount with the hard option, and the server providing the mount goes away, if you try to access a file, the operation will get stuck indefinitely. Or at least until the server goes back up/ the mount is forcefully removed.

I'm not sure if there's any other hazard, but I expect the API to read the file on the filesystem and parse it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, most calls are technically blocking, what I meant is, this one appears to be known to sometimes cause a delay. You've already guessed the same reasons I would: network filesystems or shortcuts to content on unmounted volumes. The documentation isn't clear on whether the delay is finite or not, but I would guess there's a timeout. Even a short timeout could amount to a lot if it's on every call and the call happens many times.

My question is pretty open-ended, I just wanted to note the potential problem and let someone else decide whether to accept it or address it in some way.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically I could delay and batch those requests on a thread that's then timed out from the main thread (and terminated if it doesn't collaborate on cleanly exiting), although I realized that right now we have a problem before that, reading from network shares in general. The readFile function right now is unable to properly read from them.

So I think this would most often than not fail on those.
The issue is that on Windows, but also on Linux/macOS, we are currently reading in non blocking mode to prevent blocking if data isn't there yet, particularly if we are trying to read on a file that might never get data (like a FIFO or pipe).
But there's also the case where data isn't ready yet (due to network latency), but it's definitely there. In that case the read always immediately errors out asking to try again, which is though the same error as the previous case. So we are forced to conflate the two cases, which result in immediately failing the read, and so most of the time we are unable to read from shares.

This issue a bit orthogonal at this point but to fix this we would need to use a select/poll mechanism on Unix platforms, and on Windows we can use the APIs we are already using, but retrying once in a while and keeping into account how much time as passed.
Additionally, what makes it a little more complex is choosing the timeout value that works correctly for all the file sizes or read sizes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the case where data isn't ready yet (due to network latency), but it's definitely there. In that case the read always immediately errors out asking to try again, which is though the same error as the previous case. So we are forced to conflate the two cases, which result in immediately failing the read, and so most of the time we are unable to read from shares.

Why must we conflate these two cases? Are you saying that the returned error is the same generic error, and there are not two error codes like (for example) ERROR_NODATA and ERROR_NETWORK_NOT_READY?

I think probably the fail to read is better than the chance of stalling (even though there is the osquery watchdog to kick in, even if a stall did happen), so perhaps there is no real threat at the moment. The "proper" solution you're describing does sound complex and maybe it just becomes technical debt for later.

Copy link
Member Author

@Smjert Smjert Nov 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why must we conflate these two cases? Are you saying that the returned error is the same generic error, and there are not two error codes like (for example) ERROR_NODATA and ERROR_NETWORK_NOT_READY?

To be fair I have to check what the Windows overlapped read API can return, but on POSIX there would be no difference, the O_NONBLOCK mode of read would return EAGAIN, which causes the read logic to exit immediately.
What you would need to do is a select or poll with a variable timeout, again depending on the minimum speed you expect (and probably even a global timeout too).

That being said even if the readFile could fail if the share has issues, "saving" us from locking later, the issues could be spotty, and the readFile could succeed and we could have a slow read when the SHGetFileInfoW API is called.

I think what I was trying to say is that issues like this (either access to the filesystem failing when it shouldn't or possibly hanging), are pretty diffused in osquery right now, and I would say even ignored.
I can change how the SHGetFileInfoW is called to bring us forward in the process.

I think probably the fail to read is better than the chance of stalling (even though there is the osquery watchdog to kick in, even if a stall did happen), so perhaps there is no real threat at the moment. [...]

Unfortunately no, the watchdog doesn't know what osquery is doing, so stalling is not handled.

EDIT: I was forgetting that while the SHGetFileInfoW documentation raises that warning, I expect the previous stats, the call to file->Load, the handling of the path constraint, to all have the ability to hang osquery for the same reasons (access to the filesystem).

Honestly, unless there's something else about SHGetFileInfoW that I'm not aware of, I don't think this is making things worse.
The "solution" that actually fixes everything is complicated. Either all file accesses in the tables have to be done on a separate thread somehow, with a method though that doesn't kill performance (since the naive way of launching a thread for each access or even just for each table call could be too heavy, especially when the table is called in a JOIN), and/or the watchdog has to know what osquery is doing, but then it also should have a way to know that a table is making progress or not, because otherwise one can only do a global timeout, which might not be always desirable (I know users that are ok for queries taking minutes to complete).

But again it seems a bit out of scope here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm inclined to agree with Stefano that "unless there's something else about SHGetFileInfoW" this seems like an acceptable risk, and the more general problem of network shares seems out of scope for this PR.

Copy link
Member

@zwass zwass left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some comments/concerns :)

specs/utility/file.table Show resolved Hide resolved
}

SHFILEINFO file_info{};
auto res = SHGetFileInfoW(buffer,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm inclined to agree with Stefano that "unless there's something else about SHGetFileInfoW" this seems like an acceptable risk, and the more general problem of network shares seems out of scope for this PR.

Comment on lines +77 to +79
IPersistFile* file;
hres = shell_link->QueryInterface(IID_IPersistFile,
reinterpret_cast<LPVOID*>(&file));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think file needs to be freed with a file->Release()? Shall we use a scope_guard here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤦 You're correct, thanks!

}

boost::optional<LnkData> parseLnkData(const fs::path& link) {
IShellLink* shell_link;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does shell_link need to be freed as well? Judging by the example in https://learn.microsoft.com/en-us/windows/win32/shell/links#resolving-a-shortcut it seems like it may need to be.

return boost::none;
}

std::uint32_t expected_size = 0x4C;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit confused about this being named _size. Is this an expected header size, or an expected value (which is how I read the actual code)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah the name is unfortunate, the value represents the expected header size which we expect to read as a value in the first bytes of the file header.
I'll add this as a global constant with a clearer name.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll also change the memcmp later to a memcpy to a new variable + if comparison, just to be more transparent.
I was being too clever here.

@Smjert Smjert force-pushed the stefano/feature/windows-parse-lnk branch from 0180c69 to 3ea6c29 Compare November 22, 2023 20:54
@Smjert Smjert force-pushed the stefano/feature/windows-parse-lnk branch from 3ea6c29 to 97de777 Compare November 22, 2023 23:33
This should've been removed with the old Shortcuts table
Copy link
Member

@zwass zwass left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM now, thank you!

@Smjert Smjert merged commit 1028f32 into osquery:master Nov 28, 2023
14 checks passed
@Smjert Smjert deleted the stefano/feature/windows-parse-lnk branch November 28, 2023 19:57
Smjert added a commit to Smjert/osquery that referenced this pull request Nov 29, 2023
- Add 6 new columns to the file table on Windows,
  to display Shortcut metadata (.lnk files),
  and specifically the shortcut_target_path,
  shortcut_target_type, shortcut_target_location,
  shortcut_start_in, shortcut_run, shortcut_comment
  columns.

- Fix a small bug in the file integration test, where a comma was forgotten,
  and instead of creating and testing querying two files, the concatentation
  of both was tested

- Added logic to the integration test to create shortcuts to the created files,
  and test their content.

- Fix the expandConstraints function so that it can be const,
  since it's not supposed to modify the context.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants