Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add reflector metrics #48224

Merged

Conversation

deads2k
Copy link
Contributor

@deads2k deads2k commented Jun 28, 2017

This adds metrics (optionally prometheus) to reflectors so that you can see when one reflector is behaving poorly and just how poorly its doing.

@eparis

Adds metrics for checking reflector health.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jun 28, 2017
@k8s-github-robot k8s-github-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. release-note-label-needed labels Jun 28, 2017
@deads2k deads2k force-pushed the controller-11-reflectormetrics branch from c136353 to 6b17f51 Compare June 29, 2017 14:16
@deads2k
Copy link
Contributor Author

deads2k commented Jun 29, 2017

@smarterclayton my metrics

@david-mcmahon david-mcmahon removed their assignment Jun 29, 2017
@@ -344,6 +347,12 @@ func (r *Reflector) watchHandler(w watch.Interface, resourceVersion *string, err
// Stopping the watcher should be idempotent and if we return from this function there's no way
// we're coming back in with the same watch interface.
defer w.Stop()
// update metrics
defer func() {
r.metrics.numberOfItemsInWatch.Observe(float64(eventCount))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same metric twice?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same metric twice?

just to be extra sure.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you say it's the same metric twice? I don't see other reference to numberOfItemsInWatch.

@smarterclayton smarterclayton added the sig/instrumentation Categorizes an issue or PR as relevant to SIG Instrumentation. label Jun 29, 2017
@smarterclayton
Copy link
Contributor

@DirectXMan12 can you @ other sig-instrumentation folks who review consistency on prom metrics?

@deads2k deads2k force-pushed the controller-11-reflectormetrics branch from 6b17f51 to 4ee204c Compare June 30, 2017 11:52
Copy link
Contributor

@DirectXMan12 DirectXMan12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comments inline. Needs some naming changes, and you're silently swallowing errors.

Name: "lists",
Help: "Total number of API lists done by the reflector: " + name,
})
prometheus.Register(lists)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this needs to be MustRegister, otherwise you're silently discarding the error

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this needs to be MustRegister, otherwise you're silently discarding the error

doing

}

func (_ prometheusMetricsProvider) NewListDurationMetric(name string) cache.SummaryMetric {
listDuration := prometheus.NewSummary(prometheus.SummaryOpts{
Copy link
Contributor

@DirectXMan12 DirectXMan12 Jun 30, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd add a note on why you chose summary, just in case we decide to revisit the decision later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd add a note on why you chose summary, just can case we decide to revisit the decision later.

I want the averages and percentile analysis.

func (_ prometheusMetricsProvider) NewWatchesMetric(name string) cache.CounterMetric {
watches := prometheus.NewCounter(prometheus.CounterOpts{
Subsystem: name,
Name: "watches",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if this is actually a counter, the name should have _total on the end.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto for the other counters.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if this is actually a counter, the name should have _total on the end.

doing

}

func (_ prometheusMetricsProvider) NewLastResourceVersionMetric(name string) cache.GaugeMetric {
rv := prometheus.NewGauge(prometheus.GaugeOpts{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this... is a bit of a strange case. Shouldn't it be a counter? Can the last resource version go down?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the last resource version go down?

Maybe. You can imagine cases like etcd backup restoration or a list against a latent etcd.


func (_ prometheusMetricsProvider) NewListsMetric(name string) cache.CounterMetric {
lists := prometheus.NewCounter(prometheus.CounterOpts{
Subsystem: name,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might want to consider making this a label, so that we can do aggregations or bulk queries in the dashboard (e.g. see lists done by all reflectors) more easily

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might want to consider making this a label, so that we can do aggregations or bulk queries in the dashboard (e.g. see lists done by all reflectors) more easily

Maybe later. I'm not yet convinced that all reflectors are the same (go into the same larger aggregation bucket"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hrm, I would agree that these are almost always labels - each of these is a series.

@deads2k deads2k force-pushed the controller-11-reflectormetrics branch from 4ee204c to f3fb169 Compare June 30, 2017 19:55
@deads2k
Copy link
Contributor Author

deads2k commented Jun 30, 2017

@DirectXMan12 comments addressed

Copy link
Member

@caesarxuchao caesarxuchao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few nits, otherwise lgtm.

"github.com/prometheus/client_golang/prometheus"
)

// Package prometheus sets the cache DefaultMetricsFactory to produce
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move to package declaration to be a package doc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move to package declaration to be a package doc.

done

@@ -344,6 +347,12 @@ func (r *Reflector) watchHandler(w watch.Interface, resourceVersion *string, err
// Stopping the watcher should be idempotent and if we return from this function there's no way
// we're coming back in with the same watch interface.
defer w.Stop()
// update metrics
defer func() {
r.metrics.numberOfItemsInWatch.Observe(float64(eventCount))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you say it's the same metric twice? I don't see other reference to numberOfItemsInWatch.

"time"
)

// This file provides abstractions for setting the provider (e.g., prometheus)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move it to above the package declaration?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move it to above the package declaration?

done

@deads2k deads2k force-pushed the controller-11-reflectormetrics branch 2 times, most recently from 4bbf0d3 to 53efd42 Compare July 3, 2017 14:15
@deads2k
Copy link
Contributor Author

deads2k commented Jul 3, 2017

comments addressed.

@caesarxuchao
Copy link
Member

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 5, 2017
@k8s-github-robot
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: caesarxuchao, deads2k
We suggest the following additional approver: lavalamp

Assign the PR to them by writing /assign @lavalamp in a comment when ready.

No associated issue. Update pull-request body to add a reference to an issue, or get approval with /approve no-issue

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@k8s-github-robot k8s-github-robot added the do-not-merge DEPRECATED. Indicates that a PR should not merge. Label can only be manually applied/removed. label Jul 5, 2017
@caesarxuchao
Copy link
Member

Could you add a release note?

@k8s-github-robot k8s-github-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed release-note-label-needed labels Jul 5, 2017
@deads2k
Copy link
Contributor Author

deads2k commented Jul 5, 2017

Could you add a release note?

done

@deads2k
Copy link
Contributor Author

deads2k commented Jul 5, 2017

external packages are just loading the prometheus package to enable

func (_ prometheusMetricsProvider) NewWatchDurationMetric(name string) cache.SummaryMetric {
watchDuration := prometheus.NewSummary(prometheus.SummaryOpts{
Subsystem: name,
Name: "watch_duration",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What unit is this duration

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What unit is this duration

microseconds.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prometheus convention is to do seconds, and to include that in the name i.e. watch_duration_seconds. Same for list duration.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@k8s-github-robot k8s-github-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 8, 2017
@deads2k deads2k force-pushed the controller-11-reflectormetrics branch from 53efd42 to 40174c1 Compare July 24, 2017 19:34
@k8s-github-robot k8s-github-robot removed lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Jul 24, 2017
@deads2k deads2k force-pushed the controller-11-reflectormetrics branch 3 times, most recently from fed2ba3 to 32644ab Compare July 24, 2017 20:37
@deads2k
Copy link
Contributor Author

deads2k commented Jul 24, 2017

@DirectXMan12 had a bad rebase. See if its good now.

@deads2k
Copy link
Contributor Author

deads2k commented Jul 24, 2017

/retest

@DirectXMan12
Copy link
Contributor

looks good 👍

@deads2k
Copy link
Contributor Author

deads2k commented Jul 24, 2017

/lint

Copy link
Contributor

@k8s-ci-robot k8s-ci-robot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@deads2k: 18 warnings.

In response to this:

/lint

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

}

// use summary to get averages and percentiles
func (_ prometheusMetricsProvider) NewWatchDurationMetric(name string) cache.SummaryMetric {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Golint naming: receiver name should not be an underscore. More info.


// Package prometheus sets the cache DefaultMetricsFactory to produce
// prometheus metrics. To use this package, you just have to import it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Golint comments: package comment is detached; there should be no blank lines between it and the package statement. More info.

}

// use summary to get averages and percentiles
func (_ prometheusMetricsProvider) NewItemsInWatchMetric(name string) cache.SummaryMetric {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Golint naming: receiver name should not be an underscore. More info.

return itemsPerWatch.WithLabelValues(name)
}

func (_ prometheusMetricsProvider) NewLastResourceVersionMetric(name string) cache.GaugeMetric {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Golint naming: receiver name should not be an underscore. More info.


type prometheusMetricsProvider struct{}

func (_ prometheusMetricsProvider) NewListsMetric(name string) cache.CounterMetric {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Golint naming: receiver name should not be an underscore. More info.


type noopMetricsProvider struct{}

func (_ noopMetricsProvider) NewListsMetric(name string) CounterMetric { return noopMetric{} }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Golint naming: receiver name should not be an underscore. More info.


func (_ noopMetricsProvider) NewListsMetric(name string) CounterMetric { return noopMetric{} }
func (_ noopMetricsProvider) NewListDurationMetric(name string) SummaryMetric { return noopMetric{} }
func (_ noopMetricsProvider) NewItemsInListMetric(name string) SummaryMetric { return noopMetric{} }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Golint naming: receiver name should not be an underscore. More info.

// This file provides abstractions for setting the provider (e.g., prometheus)
// of metrics.

package cache
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Golint comments: should have a package comment, unless it's in another file for this package. More info.

type noopMetricsProvider struct{}

func (_ noopMetricsProvider) NewListsMetric(name string) CounterMetric { return noopMetric{} }
func (_ noopMetricsProvider) NewListDurationMetric(name string) SummaryMetric { return noopMetric{} }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Golint naming: receiver name should not be an underscore. More info.

func (_ noopMetricsProvider) NewItemsInListMetric(name string) SummaryMetric { return noopMetric{} }
func (_ noopMetricsProvider) NewWatchesMetric(name string) CounterMetric { return noopMetric{} }
func (_ noopMetricsProvider) NewShortWatchesMetric(name string) CounterMetric { return noopMetric{} }
func (_ noopMetricsProvider) NewWatchDurationMetric(name string) SummaryMetric { return noopMetric{} }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Golint naming: receiver name should not be an underscore. More info.

@deads2k deads2k force-pushed the controller-11-reflectormetrics branch from 32644ab to b506a0f Compare July 25, 2017 12:06
@deads2k deads2k added approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm "Looks good to me", indicates that a PR is ready to be merged. and removed do-not-merge DEPRECATED. Indicates that a PR should not merge. Label can only be manually applied/removed. labels Jul 25, 2017
@deads2k
Copy link
Contributor Author

deads2k commented Jul 25, 2017

updated for linting, applied tags based on previous comments.

@deads2k
Copy link
Contributor Author

deads2k commented Jul 25, 2017

/test pull-kubernetes-kubemark-e2e-gce

@deads2k deads2k force-pushed the controller-11-reflectormetrics branch from b506a0f to 151d396 Compare July 25, 2017 13:01
@k8s-github-robot k8s-github-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 25, 2017
@deads2k deads2k added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 25, 2017
@k8s-github-robot
Copy link

Automatic merge from submit-queue (batch tested with PRs 48224, 45431, 45946, 48775, 49396)

@k8s-github-robot k8s-github-robot merged commit 9c3d0e8 into kubernetes:master Jul 25, 2017
@deads2k deads2k deleted the controller-11-reflectormetrics branch August 3, 2017 20:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/instrumentation Categorizes an issue or PR as relevant to SIG Instrumentation. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants