-
Notifications
You must be signed in to change notification settings - Fork 40k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add dynamic config metrics #57527
add dynamic config metrics #57527
Conversation
e4fec46
to
1c6b5e2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall approach SGTM
pkg/kubelet/metrics/metrics.go
Outdated
|
||
func SetAssignedNodeConfigSource(val string) { | ||
// TODO(mtaufen): will this eliminate previous metric, or generate a parallel one? | ||
AssignedNodeConfigSource.WithLabelValues(val).Set(1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This approach, implemented like this, is going to result in memory leak, because old values are not GC'd, as you noted in the comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pkg/kubelet/metrics/metrics.go
Outdated
AssignedNodeConfigSourceKey = "assigned_node_config_source" | ||
// AssignedNodeConfigSourceOkayKey's metric indicates whether the currently assigned config source and the config it contains is valid | ||
AssignedNodeConfigSourceOkayKey = "assigned_node_config_source_okay" | ||
AssignedNodeConfigSourceStatusTrue = 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use string values, performance considerations in this case do not outweigh the decreased readability
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As in, just report a string value in another label key-value pair?
Since prometheus doesn't support string valued metrics:
https://prometheus.io/docs/concepts/metric_types/
prometheus/prometheus#2227
pkg/kubelet/metrics/metrics.go
Outdated
// AssignedNodeConfigSourceKey's metric reports the config source the Kubelet believes is currently assigned to it | ||
AssignedNodeConfigSourceKey = "assigned_node_config_source" | ||
// AssignedNodeConfigSourceOkayKey's metric indicates whether the currently assigned config source and the config it contains is valid | ||
AssignedNodeConfigSourceOkayKey = "assigned_node_config_source_okay" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: IMHO ok
is more widely used than okay
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fair enough
utillog.Infof("starting controller") | ||
|
||
// set metrics when all is said and done | ||
var using string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
using
is a bad name, is says nothing and spans a huge scope
Also, if you change the logic below later, it may be used uninitialized
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand that global defer at all, why not leave setting metrics local?
pkg/kubelet/metrics/metrics.go
Outdated
@@ -101,7 +122,7 @@ var ( | |||
Help: "Interval in microseconds between relisting in PLEG.", | |||
}, | |||
) | |||
// Metrics of remote runtime operations. | |||
// Metrics for remote runtime operations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly I see a slow convergence towards comments being complete sentences, with full stop at the end and so on.
pkg/kubelet/metrics/metrics.go
Outdated
DevicePluginRegistrationCountKey = "device_plugin_registration_count" | ||
DevicePluginAllocationLatencyKey = "device_plugin_alloc_latency_microseconds" | ||
// Metric keys for node config sources | ||
// AssignedNodeConfigSourceKey's metric reports the config source the Kubelet believes is currently assigned to it | ||
AssignedNodeConfigSourceKey = "assigned_node_config_source" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the assigned_node
prefix? It's little big confusing, why not just config_source
or dynamic_config_source
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are three config references you'd want the Kubelet to report:
- What it believes its currently assigned source is.
- What it believes its last-known-good source is.
- Which of the above two it's actually using.
NodeConfigSource
is the type of the ConfigSource
field on the Node
object and is used to point the Kubelet at a config source, hence node_config_source
.
Could do:
node_config_source_assigned
node_config_source_last_known_good
node_config_source_using
(or some name other thanusing
)
pkg/kubelet/metrics/metrics.go
Outdated
AssignedNodeConfigSourceStatusUnknown = 0 | ||
AssignedNodeConfigSourceStatusFalse = -1 | ||
// LastKnownGoodNodeConfigSourceKey's metric indicates what the Kubelet is treating as it's last-known-good config source | ||
LastKnownGoodNodeConfigSourceKey = "last_known_good_node_config_source" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Try to make these hierarchical, that helps, e.g. dynamic_config_last_known_source
Thanks @crassirostris for taking a look at this |
/retest |
pkg/kubelet/metrics/metrics.go
Outdated
ActiveConfigKey = "node_config_active" | ||
LastKnownGoodConfigKey = "node_config_last_known_good" | ||
ConfigErrorKey = "node_config_error" | ||
ConfigNameLabelKey = "node_config_name" // this is a fully resolved name, e.g. an API path or a URL, "Name" is short but a little confusing name for this key... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dashpole any naming suggestions for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SelfLink is the URL representing an object in ObjectMeta. Maybe "node_config_self_link"? This should be equivalent to the self link for the configmap, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dashpole and I settled on node_config_source, as it accurately identifies what the corresponding value describes while being reasonably generic ("local", or a SelfLink, or maybe some URL in the future).
/retest |
/status approved-for-milestone |
/remove-lifecycle stale |
[MILESTONENOTIFIER] Milestone Pull Request: Up-to-date for process @crassirostris @dashpole @dchen1107 @mtaufen Pull Request Labels
|
fb81c29
to
e951978
Compare
/retest |
This PR exports config-releated metrics from the Kubelet. The Guages for active, assigned, and last-known-good config can be used to identify config versions and produce aggregate counts across several nodes. The error-reporting Gauge can be used to determine whether a node is experiencing a config-related error, and to prodouce an aggregate count of nodes in an error state.
/lgtm |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: dashpole, dchen1107, mtaufen The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Automatic merge from submit-queue (batch tested with PRs 64013, 63896, 64139, 57527, 62102). If you want to cherry-pick this change to another branch, please follow the instructions here. |
This PR exports config-releated metrics from the Kubelet.
The Guages for active, assigned, and last-known-good config can be used
to identify config versions and produce aggregate counts across several
nodes. The error-reporting Gauge can be used to determine whether a node
is experiencing a config-related error, and to prodouce an aggregate
count of nodes in an error state.
kubernetes/enhancements#281