-
Notifications
You must be signed in to change notification settings - Fork 40.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
combine storage latency and error metrics #98332
combine storage latency and error metrics #98332
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
I'm not quite sure if it will be awkward in the future if more errors are added beyond failUnknown. But what we have here seems as good as anything.
StabilityLevel: metrics.ALPHA, | ||
}, | ||
[]string{"volume_plugin", "operation_name"}, | ||
[]string{"volume_plugin", "operation_name", "status"}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that storageOperationEndToEndLatencyMetric suffers from the same problem, where errors don't have an e2e latency (see RecordMetrics).
But that seems to be a different sort of beast so rolling errors into that metric may not be desirable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah I think we need to revisit operationendtoend in general (not in this pr). The implementation is complex, and may be better suited to be tracked by an outside observer instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, the end to end metric is much more complicated and I think needs a bit of redesign. I think we should just leave it as it is for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
StabilityLevel: metrics.ALPHA, | ||
}, | ||
[]string{"volume_plugin", "operation_name"}, | ||
[]string{"volume_plugin", "operation_name", "status"}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah I think we need to revisit operationendtoend in general (not in this pr). The implementation is complex, and may be better suited to be tracked by an outside observer instead.
/retest |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: JornShen, msau42 The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/triage accepted |
Can you also add "ACTION REQUIRED" at the beginning of the release note since consumers of this metric may need to change? |
Depending on if this PR merges first or this one: #98089, do unit tests need to be updated in one of these PRs to expect only 1 metric? |
Yes, I would say merge this one first and then update the unit tests |
/retest Review the full test history for this PR. Silence the bot with an |
Sorry for reviewing this late but we removed one metric but left |
#98392 removes that metric. Somehow this change has gotten spread over three PRs :-/ |
What type of PR is this?
/kind feature
What this PR does / why we need it:
combine both latency and error status into one metric,
because right now, the latency metric only tracks latency
for successful operations and not error operations.
Which issue(s) this PR fixes:
ref: #98089 (comment)
Special notes for your reviewer:
Does this PR introduce a user-facing change?: