-
Notifications
You must be signed in to change notification settings - Fork 40k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add reflector metrics #48224
add reflector metrics #48224
Conversation
c136353
to
6b17f51
Compare
@smarterclayton my metrics |
@@ -344,6 +347,12 @@ func (r *Reflector) watchHandler(w watch.Interface, resourceVersion *string, err | |||
// Stopping the watcher should be idempotent and if we return from this function there's no way | |||
// we're coming back in with the same watch interface. | |||
defer w.Stop() | |||
// update metrics | |||
defer func() { | |||
r.metrics.numberOfItemsInWatch.Observe(float64(eventCount)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same metric twice?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same metric twice?
just to be extra sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you say it's the same metric twice? I don't see other reference to numberOfItemsInWatch.
@DirectXMan12 can you @ other sig-instrumentation folks who review consistency on prom metrics? |
6b17f51
to
4ee204c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comments inline. Needs some naming changes, and you're silently swallowing errors.
Name: "lists", | ||
Help: "Total number of API lists done by the reflector: " + name, | ||
}) | ||
prometheus.Register(lists) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this needs to be MustRegister
, otherwise you're silently discarding the error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this needs to be MustRegister, otherwise you're silently discarding the error
doing
} | ||
|
||
func (_ prometheusMetricsProvider) NewListDurationMetric(name string) cache.SummaryMetric { | ||
listDuration := prometheus.NewSummary(prometheus.SummaryOpts{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd add a note on why you chose summary, just in case we decide to revisit the decision later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd add a note on why you chose summary, just can case we decide to revisit the decision later.
I want the averages and percentile analysis.
func (_ prometheusMetricsProvider) NewWatchesMetric(name string) cache.CounterMetric { | ||
watches := prometheus.NewCounter(prometheus.CounterOpts{ | ||
Subsystem: name, | ||
Name: "watches", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if this is actually a counter, the name should have _total
on the end.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto for the other counters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if this is actually a counter, the name should have _total on the end.
doing
} | ||
|
||
func (_ prometheusMetricsProvider) NewLastResourceVersionMetric(name string) cache.GaugeMetric { | ||
rv := prometheus.NewGauge(prometheus.GaugeOpts{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this... is a bit of a strange case. Shouldn't it be a counter? Can the last resource version go down?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can the last resource version go down?
Maybe. You can imagine cases like etcd backup restoration or a list against a latent etcd.
|
||
func (_ prometheusMetricsProvider) NewListsMetric(name string) cache.CounterMetric { | ||
lists := prometheus.NewCounter(prometheus.CounterOpts{ | ||
Subsystem: name, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might want to consider making this a label, so that we can do aggregations or bulk queries in the dashboard (e.g. see lists done by all reflectors) more easily
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might want to consider making this a label, so that we can do aggregations or bulk queries in the dashboard (e.g. see lists done by all reflectors) more easily
Maybe later. I'm not yet convinced that all reflectors are the same (go into the same larger aggregation bucket"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hrm, I would agree that these are almost always labels - each of these is a series.
4ee204c
to
f3fb169
Compare
@DirectXMan12 comments addressed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few nits, otherwise lgtm.
"github.com/prometheus/client_golang/prometheus" | ||
) | ||
|
||
// Package prometheus sets the cache DefaultMetricsFactory to produce |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move to package declaration to be a package doc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move to package declaration to be a package doc.
done
@@ -344,6 +347,12 @@ func (r *Reflector) watchHandler(w watch.Interface, resourceVersion *string, err | |||
// Stopping the watcher should be idempotent and if we return from this function there's no way | |||
// we're coming back in with the same watch interface. | |||
defer w.Stop() | |||
// update metrics | |||
defer func() { | |||
r.metrics.numberOfItemsInWatch.Observe(float64(eventCount)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you say it's the same metric twice? I don't see other reference to numberOfItemsInWatch.
"time" | ||
) | ||
|
||
// This file provides abstractions for setting the provider (e.g., prometheus) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move it to above the package declaration?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move it to above the package declaration?
done
4bbf0d3
to
53efd42
Compare
comments addressed. |
/lgtm |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: caesarxuchao, deads2k Assign the PR to them by writing No associated issue. Update pull-request body to add a reference to an issue, or get approval with The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these OWNERS Files:
You can indicate your approval by writing |
Could you add a release note? |
done |
external packages are just loading the prometheus package to enable |
func (_ prometheusMetricsProvider) NewWatchDurationMetric(name string) cache.SummaryMetric { | ||
watchDuration := prometheus.NewSummary(prometheus.SummaryOpts{ | ||
Subsystem: name, | ||
Name: "watch_duration", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What unit is this duration
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What unit is this duration
microseconds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prometheus convention is to do seconds, and to include that in the name i.e. watch_duration_seconds
. Same for list duration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
53efd42
to
40174c1
Compare
fed2ba3
to
32644ab
Compare
@DirectXMan12 had a bad rebase. See if its good now. |
/retest |
looks good 👍 |
/lint |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@deads2k: 18 warnings.
In response to this:
/lint
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
} | ||
|
||
// use summary to get averages and percentiles | ||
func (_ prometheusMetricsProvider) NewWatchDurationMetric(name string) cache.SummaryMetric { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Golint naming: receiver name should not be an underscore. More info.
|
||
// Package prometheus sets the cache DefaultMetricsFactory to produce | ||
// prometheus metrics. To use this package, you just have to import it. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Golint comments: package comment is detached; there should be no blank lines between it and the package statement. More info.
} | ||
|
||
// use summary to get averages and percentiles | ||
func (_ prometheusMetricsProvider) NewItemsInWatchMetric(name string) cache.SummaryMetric { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Golint naming: receiver name should not be an underscore. More info.
return itemsPerWatch.WithLabelValues(name) | ||
} | ||
|
||
func (_ prometheusMetricsProvider) NewLastResourceVersionMetric(name string) cache.GaugeMetric { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Golint naming: receiver name should not be an underscore. More info.
|
||
type prometheusMetricsProvider struct{} | ||
|
||
func (_ prometheusMetricsProvider) NewListsMetric(name string) cache.CounterMetric { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Golint naming: receiver name should not be an underscore. More info.
|
||
type noopMetricsProvider struct{} | ||
|
||
func (_ noopMetricsProvider) NewListsMetric(name string) CounterMetric { return noopMetric{} } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Golint naming: receiver name should not be an underscore. More info.
|
||
func (_ noopMetricsProvider) NewListsMetric(name string) CounterMetric { return noopMetric{} } | ||
func (_ noopMetricsProvider) NewListDurationMetric(name string) SummaryMetric { return noopMetric{} } | ||
func (_ noopMetricsProvider) NewItemsInListMetric(name string) SummaryMetric { return noopMetric{} } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Golint naming: receiver name should not be an underscore. More info.
// This file provides abstractions for setting the provider (e.g., prometheus) | ||
// of metrics. | ||
|
||
package cache |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Golint comments: should have a package comment, unless it's in another file for this package. More info.
type noopMetricsProvider struct{} | ||
|
||
func (_ noopMetricsProvider) NewListsMetric(name string) CounterMetric { return noopMetric{} } | ||
func (_ noopMetricsProvider) NewListDurationMetric(name string) SummaryMetric { return noopMetric{} } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Golint naming: receiver name should not be an underscore. More info.
func (_ noopMetricsProvider) NewItemsInListMetric(name string) SummaryMetric { return noopMetric{} } | ||
func (_ noopMetricsProvider) NewWatchesMetric(name string) CounterMetric { return noopMetric{} } | ||
func (_ noopMetricsProvider) NewShortWatchesMetric(name string) CounterMetric { return noopMetric{} } | ||
func (_ noopMetricsProvider) NewWatchDurationMetric(name string) SummaryMetric { return noopMetric{} } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Golint naming: receiver name should not be an underscore. More info.
32644ab
to
b506a0f
Compare
updated for linting, applied tags based on previous comments. |
/test pull-kubernetes-kubemark-e2e-gce |
b506a0f
to
151d396
Compare
Automatic merge from submit-queue (batch tested with PRs 48224, 45431, 45946, 48775, 49396) |
This adds metrics (optionally prometheus) to reflectors so that you can see when one reflector is behaving poorly and just how poorly its doing.
@eparis