Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix readFile API doing blocking I/O with a non-blocking handle #6368

Merged
merged 1 commit into from
Apr 8, 2020

Conversation

Smjert
Copy link
Member

@Smjert Smjert commented Apr 6, 2020

When a block size is passed to the readFile function
or a file has no size, the read is forced to be blocking,
even if the handle is opened as non-blocking.
The opposite can happen too, a blocking handle is opened
but since a block size of 0 is passed, and the file size is not 0,
the file is read with non-blocking I/O.

This change bases the decision of doing blocking
or non-blocking I/O mainly on the "blocking" parameter
of the readFile function and the file being a special file or not.
If a handle is opened in non-blocking mode but the file is special,
the handle is reopened as blocking.

Also give a different name to the overload that provides
a way to do a read file check via readFile.

This should address the most concerning issues found in #5661

When a block size is passed to the readFile function
or a file has no size, the read is forced to be blocking,
even if the handle is opened as non-blocking.
The opposite can happen too, a blocking handle is opened
but since a block size of 0 is passed, and the file size is not 0,
the file is read with non-blocking I/O.

This change bases the decision of doing blocking
or non-blocking I/O mainly on the "blocking" parameter
of the readFile function and the file being a special file or not.
If a handle is opened in non-blocking mode but the file is special,
the handle is reopened as blocking.

Also give a different name to the overload that provides
a way to do a read file check via readFile.
@Smjert Smjert added the bug label Apr 6, 2020
@theopolis
Copy link
Member

I’d like to test this on files in proc before merging.

@Smjert
Copy link
Member Author

Smjert commented Apr 7, 2020

I’d like to test this on files in proc before merging.

I believe this situation is kind of tested already with the listening_ports table itself, which reads the various tcp, tcp6 etc files under proc.
Moreover the tests which use the test_http_server.py do use the listening_ports table to know if the server is listening. When I was developing the fix, I did an error and that table together with other immediately broke.

That been said, do you want me to put a specific test for this?

Copy link
Contributor

@farfella farfella left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some feedback for review. Acceptable to merge without a requirement to change due to feedback.

// mode
mode &= ~PF_NONBLOCK;
blocking_io = true;
fd = std::make_unique<PlatformFile>(path, mode);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there some reason we open the file twice, vs performing fd = std::make_unique<PlatformFile>(path, mode); once after this if-block?

Copy link
Member Author

@Smjert Smjert Apr 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem here is that isSpecialFile, which is used in the if, is a function of PlatformFile, which needs an instance of it to work (especially on Windows where there's actually a handle to open), so we need to open it in a mode and then change it later if needed.

}
}

PlatformTime times;
handle.fd->getFileTimes(times);

off_t total_bytes = 0;
if (file_size == 0 || block_size > 0) {
if (handle.blocking_io) {
// Reset block size to a sane minimum.
block_size = (block_size < 4096) ? 4096 : block_size;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect any block size less than 1MB will be inefficient from an IO perspective for reading files. This is from experiments I ran back in 2006. This link has some data that may explain when 4KB block sizes are preferred over 1MB block sizes and cases where 1MB is preferred: https://medium.com/@duhroach/the-impact-of-blocksize-on-persistent-disk-performance-7e50a85b2647

Copy link
Member Author

@Smjert Smjert Apr 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it highly depends on the files, the disk, the filesystem etc.
This is just a sane minimum, one is free to use a higher value when using the readFile API with files that are large enough where this makes sense.
Although I could get behind maybe changing it here:

return readFile(path,
size,
4096,

That been said, I feel like this is a different matter that needs its own issue and PR :P.

@theopolis theopolis merged commit 10e6938 into osquery:master Apr 8, 2020
@Smjert Smjert deleted the stefano/fix/readFile branch June 26, 2020 12:12
aikuchin pushed a commit to aikuchin/osquery that referenced this pull request Jul 11, 2023
… to master

* commit '8c13dd6bd206f2909a4baea5bcfbc91d5e3f502b': (159 commits)
  release: updating changelog for 4.3.0 release (osquery#6387)
  Build hvci_status table with CMake (osquery#6378)
  Change calls to debug log to verbose (osquery#6369)
  iokit: Fix race when accessing port_ (osquery#6380)
  Check extensions are registered with osquery core (osquery#6374)
  First steps to remove the Buck build system (osquery#6361)
  Return error detaching table, only use primary database (osquery#6373)
  Copy the parent environment when launching worker
  Change process table log errors to info and fix typo (osquery#6370)
  Ensure the extension uuid is never 0 (osquery#6377)
  Remove errors when converting empty numeric rows (osquery#6371)
  Do not force a specific path to install osquery on Windows (osquery#6379)
  Fix readFile API doing blocking I/O with a non-blocking handle (osquery#6368)
  magic: Check return from magic_file (osquery#6363)
  macos: Use -1 for missing ppid in process_events (osquery#6339)
  Update OpenSSL to version 1.1.1f and fix build (osquery#6359)
  Simplify how third party libraries formula work (osquery#6303)
  Add socket_events table for socket auditing in MacOS (osquery#6028)
  Extend the fields of curl_certificate table (osquery#6176)
  add status column to deb_packages table (osquery#6341)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants