Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Basic initial instrumentation of the apiserver #4272

Merged
merged 2 commits into from
Feb 10, 2015

Conversation

a-robinson
Copy link
Contributor

This links in the prometheus library for monitoring, which exports some basic resource usage metrics by default, like number of goroutines, open file descriptors, resident and virtual memory, etc. I've also started adding in request counters and latency histograms, but have only done it to two of our HTTP handlers. If this looks reasonable, I'll add them to the rest of the handlers in a second PR.

Unfortunately, the prometheus client library seems to break the build for two of the client platforms (linux/386 and linux/arm). This shouldn't be merged until I can get that fixed (and remove it from the PR), but I'm sending this out now so that hopefully someone can explain why in the world we're requiring our apiserver package to be compilable on all client platforms. It seems really weird that we pull it in to tests that we require to run on all of them, when the server binary itself only has to run on linux/x86.

Issue #1625
cc @vishh @vmarmol

@bgrant0607
Copy link
Member

cc @nikhiljindal

@roberthbailey roberthbailey changed the title Basic initial instrumentation of the apiserver [WIP] Basic initial instrumentation of the apiserver Feb 10, 2015
@a-robinson
Copy link
Contributor Author

The prometheus guys conveniently just merged a fix for the build issue tonight. I'm pulling it into our Godeps in #4275. After that gets merged, I can revert the golang.sh change and this should be good to go from a build health perspective.

@a-robinson
Copy link
Contributor Author

Although I'd still like to know if it's really intended behavior that our e2e/integration tests pull in most of our server code and are expected to run on all client platforms :)

@vmarmol
Copy link
Contributor

vmarmol commented Feb 10, 2015

Merged, feel free to rebase :)

func init() {
// All Prometheus metrics have to be explicitly registered to appear on the metrics endpoint.
prometheus.MustRegister(apiserverLatencies)
prometheus.MustRegister(redirectCounter)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Init each of these in the file they're declared in. Or declare them here? We don't use apiserverLatencies here anyways.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doh, for some reason I thought there could only be one init function per package, not per file. I was keeping the latencies one here because it's going to be shared across most of the files in the package and this seemed like a central place to put it.

@vmarmol
Copy link
Contributor

vmarmol commented Feb 10, 2015

Looks good, minor nits. If we do the defer-monitor a lot we may want to make a pattern out of it.

@a-robinson
Copy link
Contributor Author

Thanks for the feedback, PTAL. I'm not sold that the breakdown of metrics (one common one for latency then separate ones with more detail) or labels (do we really care about latency per response code but not per resource?) makes the most sense, so I'd be happy if there's feedback there as well. Perhaps @nikhiljindal has thoughts on what's most useful to monitor? Otherwise I'll just run forward with this to make progress.

@a-robinson a-robinson force-pushed the metrics branch 2 times, most recently from 9a10ef1 to 0bb3fe2 Compare February 10, 2015 20:12
prometheus library for monitoring, which exports some basic resource
usage metrics by default, like number of goroutines, open file
descriptors, resident and virtual memory, etc. I've also started adding
in request counters and latency histograms, but have only added them to two
of our HTTP handlers. If this looks reasonable, I'll add them to the rest
in a second PR.
@vishh
Copy link
Contributor

vishh commented Feb 10, 2015

LGTM. Is it possible to have unit tests for Prometheus metrics?

Name: "apiserver_request_latencies",
Help: "Response latency summary in microseconds for each request handler, verb, and HTTP response code.",
},
[]string{"handler", "verb", "code"},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you envision being the handlers? Today I see redirect and rest. I'm not sure what information you're interested in gathering, would breaking down by resource be more useful?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's also watch, proxy, and validate handlers, which I was going to add in a followup PR (although validate isn't too important). I can add them to this one if it would make things easier.

But yeah, I think including resource would be reasonable as well. I'll add it in. I'm just worried about growing the cardinality of time series too far, since a Summary metric actually gets converted into 5 exported metrics (0.5, 0.9, and 0.99 quantiles, plus a sum and a count).

What about breaking down the latency summary just by handler and verb, and then having a separate CounterVec metric with handler, verb, resource, and code?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That SGTM. Don't worry about adding the other handlers in this PR.

@a-robinson
Copy link
Contributor Author

@vishh It doesn't look like it judging by the lack of any public accessor methods, but I'm asking on their email list - https://groups.google.com/forum/#!topic/prometheus-developers/fOBc22XnNrY

@a-robinson
Copy link
Contributor Author

Should be looking better now. Thanks for the feedback!

@vmarmol
Copy link
Contributor

vmarmol commented Feb 10, 2015

LGTM

Build is failing on Shippable, but clicking on the link takes me to their main page so no idea what's failing :-\

@a-robinson
Copy link
Contributor Author

So are 3 of the 4 most recently filed PRs. Shippable hasn't been very dependable so far..

@vishh
Copy link
Contributor

vishh commented Feb 10, 2015

@a-robinson How about adding a TODO to add unit tests once thats possible?
LGTM otherwise. Shippable failure looks like a flake. I have restarted the tests.

the "code" label from the latencies SummaryVec.
@a-robinson
Copy link
Contributor Author

Done.

@vmarmol vmarmol changed the title [WIP] Basic initial instrumentation of the apiserver Basic initial instrumentation of the apiserver Feb 10, 2015
@vmarmol
Copy link
Contributor

vmarmol commented Feb 10, 2015

Minus the CIs, I think this is ready to merge. Or do you think it is still a WIP @a-robinson?

@a-robinson
Copy link
Contributor Author

It's ready to merge.

vmarmol added a commit that referenced this pull request Feb 10, 2015
Basic initial instrumentation of the apiserver
@vmarmol vmarmol merged commit 535502e into kubernetes:master Feb 10, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants