Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Persist query performance stats #8250

Merged
merged 2 commits into from
Feb 21, 2024

Conversation

getvictor
Copy link
Contributor

Fixes #7694

This implementation has the following limitation: if user modifies pack_delimiter, then all pack-related data is reset.

Specifically, the old data will still be in DB, and can be viewed with --database_dump, but osquery will now use the new pack_delimiter to save/retrieve the data. To fix this, a larger chunk of code needs to change to track the name of the query without the pack delimiter. I believe it is reasonable not to change pack_delimiter once set.

Also, persistence will only apply to scheduled queries, and not distributed queries.

@getvictor getvictor marked this pull request as ready for review January 19, 2024 18:46
@getvictor getvictor requested review from a team as code owners January 19, 2024 18:46
Copy link
Member

@zwass zwass left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this is looking good to me but I'd like to clarify the future modifications question before we go ahead with it.

Comment on lines +28 to +59
QueryPerformance::QueryPerformance(const std::string& csv) {
std::vector<std::string> parts;
boost::split(parts, csv, boost::is_any_of(","));
// future proofing the size, in case additional fields are added
if (parts.size() < 12) {
return;
}

executions = convert<std::size_t>(parts[0]);
last_executed = convert<std::uint64_t>(parts[1]);
wall_time = convert<std::uint64_t>(parts[2]);
wall_time_ms = convert<std::uint64_t>(parts[3]);
last_wall_time_ms = convert<std::uint64_t>(parts[4]);
user_time = convert<std::uint64_t>(parts[5]);
last_user_time = convert<std::uint64_t>(parts[6]);
system_time = convert<std::uint64_t>(parts[7]);
last_system_time = convert<std::uint64_t>(parts[8]);
average_memory = convert<std::uint64_t>(parts[9]);
last_memory = convert<std::uint64_t>(parts[10]);
output_size = convert<std::uint64_t>(parts[11]);
}

std::string QueryPerformance::toCSV() const {
return std::to_string(executions) + "," + std::to_string(last_executed) +
"," + std::to_string(wall_time) + "," + std::to_string(wall_time_ms) +
"," + std::to_string(last_wall_time_ms) + "," +
std::to_string(user_time) + "," + std::to_string(last_user_time) +
"," + std::to_string(system_time) + "," +
std::to_string(last_system_time) + "," +
std::to_string(average_memory) + "," + std::to_string(last_memory) +
"," + std::to_string(output_size);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC we are storing the performance data in the RocksDB store serialized to a CSV. I see the comment on line 31 about future proofing but I don't quite understand. Should we add more about what a future engineer would need to do to add/remove "columns" from the performance stats?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment refers to the validation check: parts.size() < 12

If the check was: parts.size() != 12, then it would be future proof, because we could not parse data with 13 elements, which might happen in the future.

output_size = convert<std::uint64_t>(parts[11]);
}

std::string QueryPerformance::toCSV() const {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My first thought, is that CSV feels like a brittle format. But looking at the data, it's a dozen numbers. And CSV doesn't feel wrong for that case. JSON feels like a lot of overhead, and ASN.1 is, well, ASN.1.

Though I'm a little leery of the parsing, but since I think rocksdb is all string values under the hood, I'm not sure there's better

@directionless directionless added this to the 5.12.0 milestone Feb 15, 2024
@zwass zwass merged commit 7f557d3 into osquery:master Feb 21, 2024
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Persist osquery_schedule table data across osquery restarts
3 participants