-
-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Persist query performance stats #8250
Persist query performance stats #8250
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall this is looking good to me but I'd like to clarify the future modifications question before we go ahead with it.
QueryPerformance::QueryPerformance(const std::string& csv) { | ||
std::vector<std::string> parts; | ||
boost::split(parts, csv, boost::is_any_of(",")); | ||
// future proofing the size, in case additional fields are added | ||
if (parts.size() < 12) { | ||
return; | ||
} | ||
|
||
executions = convert<std::size_t>(parts[0]); | ||
last_executed = convert<std::uint64_t>(parts[1]); | ||
wall_time = convert<std::uint64_t>(parts[2]); | ||
wall_time_ms = convert<std::uint64_t>(parts[3]); | ||
last_wall_time_ms = convert<std::uint64_t>(parts[4]); | ||
user_time = convert<std::uint64_t>(parts[5]); | ||
last_user_time = convert<std::uint64_t>(parts[6]); | ||
system_time = convert<std::uint64_t>(parts[7]); | ||
last_system_time = convert<std::uint64_t>(parts[8]); | ||
average_memory = convert<std::uint64_t>(parts[9]); | ||
last_memory = convert<std::uint64_t>(parts[10]); | ||
output_size = convert<std::uint64_t>(parts[11]); | ||
} | ||
|
||
std::string QueryPerformance::toCSV() const { | ||
return std::to_string(executions) + "," + std::to_string(last_executed) + | ||
"," + std::to_string(wall_time) + "," + std::to_string(wall_time_ms) + | ||
"," + std::to_string(last_wall_time_ms) + "," + | ||
std::to_string(user_time) + "," + std::to_string(last_user_time) + | ||
"," + std::to_string(system_time) + "," + | ||
std::to_string(last_system_time) + "," + | ||
std::to_string(average_memory) + "," + std::to_string(last_memory) + | ||
"," + std::to_string(output_size); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC we are storing the performance data in the RocksDB store serialized to a CSV. I see the comment on line 31 about future proofing but I don't quite understand. Should we add more about what a future engineer would need to do to add/remove "columns" from the performance stats?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment refers to the validation check: parts.size() < 12
If the check was: parts.size() != 12
, then it would be future proof, because we could not parse data with 13 elements, which might happen in the future.
output_size = convert<std::uint64_t>(parts[11]); | ||
} | ||
|
||
std::string QueryPerformance::toCSV() const { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My first thought, is that CSV feels like a brittle format. But looking at the data, it's a dozen numbers. And CSV doesn't feel wrong for that case. JSON feels like a lot of overhead, and ASN.1 is, well, ASN.1.
Though I'm a little leery of the parsing, but since I think rocksdb is all string values under the hood, I'm not sure there's better
Fixes #7694
This implementation has the following limitation: if user modifies pack_delimiter, then all pack-related data is reset.
Specifically, the old data will still be in DB, and can be viewed with --database_dump, but osquery will now use the new pack_delimiter to save/retrieve the data. To fix this, a larger chunk of code needs to change to track the name of the query without the pack delimiter. I believe it is reasonable not to change pack_delimiter once set.
Also, persistence will only apply to scheduled queries, and not distributed queries.