Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add version collate functions #8168

Merged

Conversation

Micah-Kolide
Copy link
Contributor

Quality of life improvement. Version strings can be annoying to compare/evaluate since a simple operation can't properly handle numeric values. To work around that I've added a new collation for version comparison.

osquery> SELECT '101.1.10' > '101.1.2';
+------------------------+
| '101.1.10' > '101.1.2' |
+------------------------+
| 0                      |
+------------------------+
osquery> SELECT '101.1.10' > '101.1.2' COLLATE VERSION;
+----------------------------------------+
| '101.1.10' > '101.1.2' COLLATE VERSION |
+----------------------------------------+
| 1                                      |
+----------------------------------------+

This works by splitting strings on . into segments, then grouping consecutive numeric and alphabetic characters into pieces to compare against. Special characters are dropped, but they continue to split off the grouping. I considered merging those split off groups together, but I feel it would be best to continue splitting off to the next piece instead.

This should also be pretty nice on version columns in tables if we add the collation to them.

osquery> CREATE TEMP TABLE test(version TEXT COLLATE VERSION);
osquery> INSERT INTO test VALUES ('100.2.30');
osquery> SELECT * FROM test WHERE version > '100.2.4';
+----------+
| version  |
+----------+
| 100.2.30 |
+----------+
osquery> SELECT version FROM os_version WHERE version > '11.0';
+---------+
| version |
+---------+
| 14.0    |
+---------+
osquery> SELECT version FROM os_version WHERE version > '100.0';
+---------+
| version |
+---------+
| 14.0    |
+---------+
osquery> INSERT INTO test VALUES ('14.0');
osquery> SELECT * FROM test WHERE version > '11.0';
+----------+
| version  |
+----------+
| 100.2.30 |
| 14.0     |
+----------+
osquery> SELECT * FROM test WHERE version > '100.0';
+----------+
| version  |
+----------+
| 100.2.30 |
+----------+

@Micah-Kolide Micah-Kolide requested review from a team as code owners October 20, 2023 02:20
@Smjert
Copy link
Member

Smjert commented Oct 20, 2023

Closing and reopening to pick up a test fix from master.

@Smjert Smjert closed this Oct 20, 2023
@Smjert Smjert reopened this Oct 20, 2023
Copy link
Member

@directionless directionless left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think test files will help. You can root around https://github.com/osquery/osquery/pull/6535/files but I don't like the implementation there. (Which is why I never merged it)

osquery/sql/sqlite_collation.cpp Outdated Show resolved Hide resolved
osquery/sql/sqlite_collation.cpp Outdated Show resolved Hide resolved
osquery/sql/sqlite_collation.cpp Outdated Show resolved Hide resolved
@Smjert
Copy link
Member

Smjert commented Oct 23, 2023

I think test files will help. You can root around https://github.com/osquery/osquery/pull/6535/files but I don't like the implementation there. (Which is why I never merged it)

By the way, we also have this: https://github.com/osquery/osquery/blob/master/osquery/utils/versioning/semantic.cpp, which is currently unused, so I think we should have one implementation at the end.

EDIT: To clarify, if there's no use of code like semantic.cpp (as it's happening now) which extracts the major, minor, patch values from a string, I would just remove it.

@Micah-Kolide Micah-Kolide force-pushed the micah/add_version_comparison branch from d2a39bb to 2b8b227 Compare November 3, 2023 15:12
Copy link
Member

@directionless directionless left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks pretty clean to me, and reasonably easy to follow. (Sweeeet!)

A couple of suggestions:

  1. I kinda understand all the boolean options to versionCompare, but I think some comments would help. Some words about why one might want epoch, and how it's used by dpkg. For example...
  2. If we wanted to expose these as sql functions, and not just collations, how would we? Does it require moving any of the functions around?
  3. Some docs should land in https://github.com/osquery/osquery/blob/master/docs/wiki/introduction/sql.md which will then appear in https://osquery.readthedocs.io/en/latest/introduction/sql/ Maybe a Collation section?
  4. Tests are hard to follow, but I'm not sure how to make them better.It is, inherently, a lot.

osquery/sql/tests/sql.cpp Show resolved Hide resolved
osquery/sql/sqlite_collation.cpp Outdated Show resolved Hide resolved
osquery/sql/sqlite_collation.cpp Outdated Show resolved Hide resolved
osquery/sql/sqlite_collation.cpp Outdated Show resolved Hide resolved
osquery/sql/sqlite_collation.cpp Outdated Show resolved Hide resolved
osquery/sql/sqlite_collation.cpp Outdated Show resolved Hide resolved
Smjert
Smjert previously requested changes Nov 7, 2023
Copy link
Member

@Smjert Smjert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is only a quick stylistic review: can we use std::string_view instead of const char* + length?

With that, things like strcspn can be substituted with the string_view method find, which will return the position in the string where the character has been found.

Basically, lets not write C.

The variable names should be snake_case.

Finally, for the switch cases on the ASCII value, instead of the number, can we use the character itself? '~' etc, so that it's easier to understand which character that corresponds to.

@directionless
Copy link
Member

This is only a quick stylistic review: can we use std::string_view instead of const char* + length?

I think that part is driven by the collate api.

@Smjert
Copy link
Member

Smjert commented Nov 7, 2023

I think that part is driven by the collate api.

That's the interface, but versionCompare can be written using std::string_view (which can be constructed from a char * and length).
In general, I'm not saying to never use a pointer and a length separately, but since conceptually those are strings/part of strings, having a single entity that represents that is better for readability and to avoid errors, vs having to separately keep them correctly in sync.

Copy link
Member

@directionless directionless left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking a bit about tests, I have two observations:

  1. Everything is testing true. I wonder if anything should test false
  2. I wonder if it would be more or less clear to define an array and then sort the whole thing and check the order. Doesn't handle the equality case, but would it be easier to see the order in?

osquery/sql/tests/sql.cpp Show resolved Hide resolved
@Micah-Kolide
Copy link
Contributor Author

Micah-Kolide commented Nov 9, 2023

This is only a quick stylistic review: can we use std::string_view instead of const char* + length?

I've switched to using string_views in versionCompare, and followed the other formatting suggestions.

I went ahead and made the version_compare function, since it looked pretty easy to add that around versionCompare.

Added information to the documents, and more comments around the versionCompare options.

  1. Everything is testing true. I wonder if anything should test false

I felt there wasn't a need to test false since we can iterate other operators, but I can add some false tests if it would make things more clear.

  1. I wonder if it would be more or less clear to define an array and then sort the whole thing and check the order. Doesn't handle the equality case, but would it be easier to see the order in?

I don't quite know what you mean by this. Could you provide an example?

@Smjert Smjert dismissed their stale review November 13, 2023 09:44

My inital suggestions have been addressed

@directionless directionless added this to the 5.11.0 milestone Nov 25, 2023
Copy link
Member

@directionless directionless left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the sql function gets simplified, this is mergable ,

docs/wiki/introduction/sql.md Outdated Show resolved Hide resolved
docs/wiki/introduction/sql.md Show resolved Hide resolved
@Micah-Kolide Micah-Kolide force-pushed the micah/add_version_comparison branch from 9964283 to d771b28 Compare December 13, 2023 17:22
Copy link
Member

@directionless directionless left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is good. @Smjert anything else?

@directionless directionless changed the title Add version collate Add version collate functiona Dec 18, 2023
@directionless directionless changed the title Add version collate functiona Add version collate functions Dec 18, 2023
@directionless directionless merged commit a184da4 into osquery:master Dec 18, 2023
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants