Persist query performance stats #8250

getvictor · 2024-01-19T17:48:36Z

This implementation has the following limitation: if user modifies pack_delimiter, then all pack-related data is reset.

Specifically, the old data will still be in DB, and can be viewed with --database_dump, but osquery will now use the new pack_delimiter to save/retrieve the data. To fix this, a larger chunk of code needs to change to track the name of the query without the pack delimiter. I believe it is reasonable not to change pack_delimiter once set.

Also, persistence will only apply to scheduled queries, and not distributed queries.

zwass

Overall this is looking good to me but I'd like to clarify the future modifications question before we go ahead with it.

zwass · 2024-01-22T18:56:48Z

osquery/core/sql/query_performance.cpp

+QueryPerformance::QueryPerformance(const std::string& csv) {
+  std::vector<std::string> parts;
+  boost::split(parts, csv, boost::is_any_of(","));
+  // future proofing the size, in case additional fields are added
+  if (parts.size() < 12) {
+    return;
+  }
+
+  executions = convert<std::size_t>(parts[0]);
+  last_executed = convert<std::uint64_t>(parts[1]);
+  wall_time = convert<std::uint64_t>(parts[2]);
+  wall_time_ms = convert<std::uint64_t>(parts[3]);
+  last_wall_time_ms = convert<std::uint64_t>(parts[4]);
+  user_time = convert<std::uint64_t>(parts[5]);
+  last_user_time = convert<std::uint64_t>(parts[6]);
+  system_time = convert<std::uint64_t>(parts[7]);
+  last_system_time = convert<std::uint64_t>(parts[8]);
+  average_memory = convert<std::uint64_t>(parts[9]);
+  last_memory = convert<std::uint64_t>(parts[10]);
+  output_size = convert<std::uint64_t>(parts[11]);
+}
+
+std::string QueryPerformance::toCSV() const {
+  return std::to_string(executions) + "," + std::to_string(last_executed) +
+         "," + std::to_string(wall_time) + "," + std::to_string(wall_time_ms) +
+         "," + std::to_string(last_wall_time_ms) + "," +
+         std::to_string(user_time) + "," + std::to_string(last_user_time) +
+         "," + std::to_string(system_time) + "," +
+         std::to_string(last_system_time) + "," +
+         std::to_string(average_memory) + "," + std::to_string(last_memory) +
+         "," + std::to_string(output_size);
+}


IIUC we are storing the performance data in the RocksDB store serialized to a CSV. I see the comment on line 31 about future proofing but I don't quite understand. Should we add more about what a future engineer would need to do to add/remove "columns" from the performance stats?

The comment refers to the validation check: parts.size() < 12

If the check was: parts.size() != 12, then it would be future proof, because we could not parse data with 13 elements, which might happen in the future.

directionless · 2024-02-06T12:51:03Z

osquery/core/sql/query_performance.cpp

+  output_size = convert<std::uint64_t>(parts[11]);
+}
+
+std::string QueryPerformance::toCSV() const {


My first thought, is that CSV feels like a brittle format. But looking at the data, it's a dozen numbers. And CSV doesn't feel wrong for that case. JSON feels like a lot of overhead, and ASN.1 is, well, ASN.1.

Though I'm a little leery of the parsing, but since I think rocksdb is all string values under the hood, I'm not sure there's better

getvictor added 2 commits January 19, 2024 11:28

Persist query performance stats in RocksDB.

cc8ea63

Fixing lint.

0c21188

getvictor marked this pull request as ready for review January 19, 2024 18:46

getvictor requested review from a team as code owners January 19, 2024 18:46

zwass reviewed Jan 22, 2024

View reviewed changes

directionless approved these changes Feb 6, 2024

View reviewed changes

directionless added this to the 5.12.0 milestone Feb 15, 2024

zwass merged commit 7f557d3 into osquery:master Feb 21, 2024
16 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Persist query performance stats #8250

Persist query performance stats #8250

getvictor commented Jan 19, 2024

zwass left a comment

zwass Jan 22, 2024

getvictor Jan 28, 2024

directionless Feb 6, 2024

Persist query performance stats #8250

Persist query performance stats #8250

Conversation

getvictor commented Jan 19, 2024

zwass left a comment

Choose a reason for hiding this comment

zwass Jan 22, 2024

Choose a reason for hiding this comment

getvictor Jan 28, 2024

Choose a reason for hiding this comment

directionless Feb 6, 2024

Choose a reason for hiding this comment