Basic initial instrumentation of the apiserver #4272

a-robinson · 2015-02-10T03:47:17Z

This links in the prometheus library for monitoring, which exports some basic resource usage metrics by default, like number of goroutines, open file descriptors, resident and virtual memory, etc. I've also started adding in request counters and latency histograms, but have only done it to two of our HTTP handlers. If this looks reasonable, I'll add them to the rest of the handlers in a second PR.

Unfortunately, the prometheus client library seems to break the build for two of the client platforms (linux/386 and linux/arm). This shouldn't be merged until I can get that fixed (and remove it from the PR), but I'm sending this out now so that hopefully someone can explain why in the world we're requiring our apiserver package to be compilable on all client platforms. It seems really weird that we pull it in to tests that we require to run on all of them, when the server binary itself only has to run on linux/x86.

Issue #1625
cc @vishh @vmarmol

bgrant0607 · 2015-02-10T03:52:31Z

cc @nikhiljindal

a-robinson · 2015-02-10T09:46:36Z

The prometheus guys conveniently just merged a fix for the build issue tonight. I'm pulling it into our Godeps in #4275. After that gets merged, I can revert the golang.sh change and this should be good to go from a build health perspective.

a-robinson · 2015-02-10T09:47:29Z

Although I'd still like to know if it's really intended behavior that our e2e/integration tests pull in most of our server code and are expected to run on all client platforms :)

vmarmol · 2015-02-10T16:10:25Z

Merged, feel free to rebase :)

vmarmol · 2015-02-10T16:30:52Z

pkg/apiserver/apiserver.go

+func init() {
+	// All Prometheus metrics have to be explicitly registered to appear on the metrics endpoint.
+	prometheus.MustRegister(apiserverLatencies)
+	prometheus.MustRegister(redirectCounter)


nit: Init each of these in the file they're declared in. Or declare them here? We don't use apiserverLatencies here anyways.

Doh, for some reason I thought there could only be one init function per package, not per file. I was keeping the latencies one here because it's going to be shared across most of the files in the package and this seemed like a central place to put it.

vmarmol · 2015-02-10T16:31:51Z

Looks good, minor nits. If we do the defer-monitor a lot we may want to make a pattern out of it.

a-robinson · 2015-02-10T19:18:27Z

Thanks for the feedback, PTAL. I'm not sold that the breakdown of metrics (one common one for latency then separate ones with more detail) or labels (do we really care about latency per response code but not per resource?) makes the most sense, so I'd be happy if there's feedback there as well. Perhaps @nikhiljindal has thoughts on what's most useful to monitor? Otherwise I'll just run forward with this to make progress.

prometheus library for monitoring, which exports some basic resource usage metrics by default, like number of goroutines, open file descriptors, resident and virtual memory, etc. I've also started adding in request counters and latency histograms, but have only added them to two of our HTTP handlers. If this looks reasonable, I'll add them to the rest in a second PR.

vishh · 2015-02-10T20:52:46Z

LGTM. Is it possible to have unit tests for Prometheus metrics?

vmarmol · 2015-02-10T20:53:25Z

pkg/apiserver/apiserver.go

+			Name: "apiserver_request_latencies",
+			Help: "Response latency summary in microseconds for each request handler, verb, and HTTP response code.",
+		},
+		[]string{"handler", "verb", "code"},


What do you envision being the handlers? Today I see redirect and rest. I'm not sure what information you're interested in gathering, would breaking down by resource be more useful?

There's also watch, proxy, and validate handlers, which I was going to add in a followup PR (although validate isn't too important). I can add them to this one if it would make things easier.

But yeah, I think including resource would be reasonable as well. I'll add it in. I'm just worried about growing the cardinality of time series too far, since a Summary metric actually gets converted into 5 exported metrics (0.5, 0.9, and 0.99 quantiles, plus a sum and a count).

What about breaking down the latency summary just by handler and verb, and then having a separate CounterVec metric with handler, verb, resource, and code?

That SGTM. Don't worry about adding the other handlers in this PR.

a-robinson · 2015-02-10T22:08:47Z

@vishh It doesn't look like it judging by the lack of any public accessor methods, but I'm asking on their email list - https://groups.google.com/forum/#!topic/prometheus-developers/fOBc22XnNrY

a-robinson · 2015-02-10T22:24:59Z

Should be looking better now. Thanks for the feedback!

vmarmol · 2015-02-10T22:40:32Z

LGTM

Build is failing on Shippable, but clicking on the link takes me to their main page so no idea what's failing :-\

a-robinson · 2015-02-10T22:42:58Z

So are 3 of the 4 most recently filed PRs. Shippable hasn't been very dependable so far..

vishh · 2015-02-10T22:44:08Z

@a-robinson How about adding a TODO to add unit tests once thats possible?
LGTM otherwise. Shippable failure looks like a flake. I have restarted the tests.

the "code" label from the latencies SummaryVec.

a-robinson · 2015-02-10T22:48:40Z

Done.

vmarmol · 2015-02-10T22:49:59Z

Minus the CIs, I think this is ready to merge. Or do you think it is still a WIP @a-robinson?

a-robinson · 2015-02-10T22:51:56Z

It's ready to merge.

Basic initial instrumentation of the apiserver

mbforbes added the area/monitoring label Feb 10, 2015

roberthbailey changed the title ~~Basic initial instrumentation of the apiserver~~ [WIP] Basic initial instrumentation of the apiserver Feb 10, 2015

roberthbailey assigned vmarmol Feb 10, 2015

This was referenced Feb 10, 2015

Compilation error when building for certain platforms prometheus/client_golang#69

Closed

Update prometheus instrumentation library to include fix for building for linux/arm and linux/386 platforms #4275

Merged

vmarmol reviewed Feb 10, 2015
View reviewed changes

a-robinson force-pushed the metrics branch from 3d8ac2a to 6fe4bcb Compare February 10, 2015 19:16

a-robinson force-pushed the metrics branch 2 times, most recently from 9a10ef1 to 0bb3fe2 Compare February 10, 2015 20:12

a-robinson force-pushed the metrics branch from 0bb3fe2 to 2463cff Compare February 10, 2015 20:44

vmarmol reviewed Feb 10, 2015
View reviewed changes

Consolidate the prometheus Counters into a shared CounterVec and remove

d39262d

the "code" label from the latencies SummaryVec.

a-robinson force-pushed the metrics branch from 6718957 to d39262d Compare February 10, 2015 22:48

vmarmol changed the title ~~[WIP] Basic initial instrumentation of the apiserver~~ Basic initial instrumentation of the apiserver Feb 10, 2015

vmarmol added a commit that referenced this pull request Feb 10, 2015

Merge pull request #4272 from a-robinson/metrics

535502e

Basic initial instrumentation of the apiserver

vmarmol merged commit 535502e into kubernetes:master Feb 10, 2015

a-robinson mentioned this pull request Feb 10, 2015

Add monitoring instrumentation for the remaining HTTP handlers in the apiserver. #4299

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Basic initial instrumentation of the apiserver #4272

Basic initial instrumentation of the apiserver #4272

a-robinson commented Feb 10, 2015

bgrant0607 commented Feb 10, 2015

a-robinson commented Feb 10, 2015

a-robinson commented Feb 10, 2015

vmarmol commented Feb 10, 2015

vmarmol Feb 10, 2015

a-robinson Feb 10, 2015

vmarmol commented Feb 10, 2015

a-robinson commented Feb 10, 2015

vishh commented Feb 10, 2015

vmarmol Feb 10, 2015

a-robinson Feb 10, 2015

vmarmol Feb 10, 2015

a-robinson commented Feb 10, 2015

a-robinson commented Feb 10, 2015

vmarmol commented Feb 10, 2015

a-robinson commented Feb 10, 2015

vishh commented Feb 10, 2015

a-robinson commented Feb 10, 2015

vmarmol commented Feb 10, 2015

a-robinson commented Feb 10, 2015

Basic initial instrumentation of the apiserver #4272

Basic initial instrumentation of the apiserver #4272

Conversation

a-robinson commented Feb 10, 2015

bgrant0607 commented Feb 10, 2015

a-robinson commented Feb 10, 2015

a-robinson commented Feb 10, 2015

vmarmol commented Feb 10, 2015

vmarmol Feb 10, 2015

Choose a reason for hiding this comment

a-robinson Feb 10, 2015

Choose a reason for hiding this comment

vmarmol commented Feb 10, 2015

a-robinson commented Feb 10, 2015

vishh commented Feb 10, 2015

vmarmol Feb 10, 2015

Choose a reason for hiding this comment

a-robinson Feb 10, 2015

Choose a reason for hiding this comment

vmarmol Feb 10, 2015

Choose a reason for hiding this comment

a-robinson commented Feb 10, 2015

a-robinson commented Feb 10, 2015

vmarmol commented Feb 10, 2015

a-robinson commented Feb 10, 2015

vishh commented Feb 10, 2015

a-robinson commented Feb 10, 2015

vmarmol commented Feb 10, 2015

a-robinson commented Feb 10, 2015