Add a metric exposing number of objects per type #59757

gmarek · 2018-02-12T16:00:12Z

Fix #51953

Adds a goroutine that periodically checks the count of objects in etcd and publishes a metric with this data.

APIserver backed by etcdv3 exports metric showing number of resources per kind

liggitt · 2018-02-12T16:04:55Z

staging/src/k8s.io/apiserver/pkg/server/storage/storage_factory.go

@@ -284,6 +289,20 @@ func (s *DefaultStorageFactory) NewConfig(groupResource schema.GroupResource) (*
 	}
 	glog.V(3).Infof("storing %v in %v, reading as %v from %#v", groupResource, codecConfig.StorageVersion, codecConfig.MemoryVersion, codecConfig.Config)

+	if storageConfig.Type == storagebackend.StorageTypeETCD3 {


really not a fan of leaking the storage implementation here

I can just always fill those details here. It's generic enough that it should be possible to implement in arbitrary backend.

liggitt · 2018-02-12T16:07:14Z

staging/src/k8s.io/apiserver/pkg/server/storage/storage_factory.go

+			storageConfig.ResourcesToMonitor = make(map[string]string)
+		}
+		resourcePrefix := s.ResourcePrefix(groupResource)
+		path := []string{"/registry"}


this sets up a duplicate prefix construction path that doesn't honor the configured etcd prefix (--etcd-prefix), and assumes nesting under the API group name, which isn't a valid assumption

Prefix is easily fixable, I'm not sure about the api group name - why it's not valid (except core/v1 ones)?

none of the kube-apiserver types place themselves under group subpaths. see test/integration/etcd/etcd_storage_path_test.go for all the various non-standard key paths used.

I don't think we should set up another place like this where we try to pull together all the etcd prefix + resource prefix config values... if we want to run metrics on storage, why not do it inside generic store and storage, where this information already lives?

I was looking into that yesterday, but got pulled into different things before I was able to get to the bottom of this. Short answer is that I have no idea how the storage code is structured.

The way I understand it is that Storage (the layer that talks to etcd and the place where monitoring needs to live) is created before generic.Store(s) and get injected into Stores by something (Decorator)?.

So I can either modify the injection logic, to also pass some data down to Storage (which does have an unpleasant smell to it), or add a simple "Count" function to Storage. I'd rather do the latter, but I don't really want to make it full blown API-like request.

The issue with the latter is that I have to either provide EXTREMELY inefficient implementations for etcdv2 and Cacher, or just always return -1/unimplemented error. Neither is particularly nice...

or add a simple "Count" function to Storage. I'd rather do the latter, but I don't really want to make it full blown API-like request

That is the most accurate representation of what this is attempting to do. If we want to pull info out of etcd in a new way, the Storage interface is the correct place to represent that.

I have to either provide EXTREMELY inefficient implementations for etcdv2 and Cacher

Etcd2 returning an unimplemented error doesn't particularly bother me. Cacher could delegate to underlying store (as it does for most things), or add a KeyCount() function to the cache.Store interface

This is what I'll try to send out today, assuming I'll stop getting interrupted...

liggitt · 2018-02-14T16:26:06Z

staging/src/k8s.io/apiserver/pkg/server/options/etcd.go

@@ -155,6 +156,8 @@ func (s *EtcdOptions) AddFlags(fs *pflag.FlagSet) {
 	fs.DurationVar(&s.StorageConfig.CompactionInterval, "etcd-compaction-interval", s.StorageConfig.CompactionInterval,
 		"The interval of compaction requests. If 0, the compaction request from apiserver is disabled.")

+	fs.DurationVar(&s.StorageConfig.CountMetricPollPeriod, "etcd-count-metric-poll-period", 5*time.Second, ""+


I don't know how expensive a call this is, but counting every resource type every 5 seconds is more frequent than I expected. who would be a good resource to ask about this?

I can put anything you prefer here. As for resources to know I guess someone from etcd team - maybe @xiang90 still works on that?

I increased it to one minute interval. It should be good enough, but we'll see how scale tests will behave. @shyamjvs @kubernetes/sig-scalability-misc

liggitt · 2018-02-14T16:26:49Z

staging/src/k8s.io/apiserver/pkg/storage/etcd/metrics/metrics.go

+	objectCounts = prometheus.NewGaugeVec(
+		prometheus.GaugeOpts{
+			Name: "apiserver_object_counts",
+			Help: "Current number of objects stored. Counts only types that have watch-cache enabled.",


I don't think the comment is accurate... looks like it would work without the watch cache, right?

liggitt · 2018-02-14T16:28:57Z

staging/src/k8s.io/apiserver/pkg/registry/generic/registry/store.go

@@ -1365,9 +1367,26 @@ func (e *Store) CompleteWithOptions(options *generic.StoreOptions) error {
 		)
 	}

+	if opts.CountMetricPollPeriod > 0 {
+		e.startObservingCount(e.DefaultQualifiedResource.String(), opts.StorageConfig.Prefix+prefix, opts.CountMetricPollPeriod)


don't run forever, make sure we have a stop channel and non-nil Destroy() func that will stop this

for example:

stopMetricsCh := make(chan struct{}) destroy := e.DestroyFunc e.DestroyFunc = func(){ close(stopMetricsCh) if destroy != nil { destroy() } } e.startObservingCount(..., stopMetricsCh)

don't join opts.StorageConfig.Prefix+prefix here... the storage is responsible for dealing with prepending opts.StorageConfig.Prefix

also, don't assume prefix is the key... call e.KeyRootFunc() to get the key

I started to return a destroy function from 'start...', which is functionally equivalent.

liggitt · 2018-02-14T16:29:34Z

staging/src/k8s.io/apiserver/pkg/registry/generic/registry/store.go

 	return nil
 }

+func (e *Store) startObservingCount(resourceName, pathPrefix string, period time.Duration) {
+	glog.V(2).Infof("Monitoring %v bound to %v", resourceName, pathPrefix)
+	go func() {


simplify to go wait.Forever(...?

liggitt · 2018-02-14T17:16:19Z

staging/src/k8s.io/apiserver/pkg/storage/etcd3/store.go

@@ -85,6 +85,14 @@ type objState struct {
 	stale bool
 }

+func (s *store) Count(pathPrefix string) (int64, error) {


I'd expect the key parameter to match that passed to GetToList(), etc, and get joined with the pathPrefix already held by store

func (s *store) Count(key string) (int64, error) { key = path.Join(s.pathPrefix, key) getResp, err := s.client.KV.Get(context.Background(), key, ... ... }

liggitt · 2018-02-14T17:24:49Z

cc @kubernetes/sig-api-machinery-pr-reviews

gmarek · 2018-02-15T11:48:30Z

/test pull-kubernetes-kubemark-e2e-gce-big

gmarek · 2018-02-15T12:38:48Z

/test pull-kubernetes-kubemark-e2e-gce-big

sttts · 2018-02-19T12:04:39Z

staging/src/k8s.io/apiserver/pkg/registry/generic/registry/store.go

 	return nil
 }

+// startObservingCount starts monitoring given prefix and periodically updating metrics. It return a function to stop collection.
+func (e *Store) startObservingCount(period time.Duration) func() {


What happens for co-habitation? Will we get two monitors?

I don't know storage code good enough to answer that. It depends on how co-habitation is implemented and whether two Stores for co-habitating resources share the same Storage object, or control will enter the if Storage == nilbranch twice.

Having two monitors is not ideal, but it's also not a disaster IMO.

Agree. We can live with that.

sttts · 2018-02-19T12:04:56Z

staging/src/k8s.io/apiserver/pkg/registry/generic/registry/store.go

 	return nil
 }

+// startObservingCount starts monitoring given prefix and periodically updating metrics. It return a function to stop collection.


nit: returns.

sttts · 2018-02-19T12:05:58Z

staging/src/k8s.io/apiserver/pkg/storage/etcd/etcd_helper.go

@@ -586,6 +586,10 @@ func (h *etcdHelper) GuaranteedUpdate(
 	}
 }

+func (*etcdHelper) Count(pathPerfix string) (int64, error) {
+	return 0, fmt.Errorf("Count is unimplemented for etcd2!")


is this an error? If we want to distinguish between error and "not implement" we should have at least a special pre-defined error type.

Agreed, but I guess we'd need to add it to apimachinery/pkg/api/errors to be consistently used. I can add it in separate PR, as I expect a lot of discussions around that.

Alternative: convention that -1 means "not implemented"

-1 currently means "no data", as per @liggitt request. Also we're not writing in C to do such things;)

gmarek · 2018-02-19T16:27:42Z

/test pull-kubernetes-kubemark-e2e-gce-big

gmarek · 2018-02-20T09:01:44Z

@sttts @liggitt - please let me know what's necessary to get this PR in before the freeze. kubemark-big test passed, so it's not impacting the performance too much.

wojtek-t

/approve no-issue

wojtek-t · 2018-02-21T10:40:11Z

staging/src/k8s.io/apiserver/pkg/storage/etcd/metrics/metrics.go

+	objectCounts = prometheus.NewGaugeVec(
+		prometheus.GaugeOpts{
+			Name: "etcd_object_counts",
+			Help: "Current number of stored objects split by kind.",


It's not "Current". It's number of objects from some period of time in the last minute (or whetever user will say). May be worth being explicit...

I will change it, but frankly no metric is ever "current" in the strict sense:)

gmarek · 2018-02-21T16:43:30Z

Ping @sttts @liggitt - I'd like to get this in before the freeze.

liggitt · 2018-02-21T19:08:41Z

would like an ack from @lavalamp or @deads2k on the Storage interface addition, since it could impact api-machinery storage plans (though it seems a pretty reasonable thing to support to me)

deads2k · 2018-02-21T19:22:49Z

staging/src/k8s.io/apiserver/pkg/registry/generic/registry/store.go

+	go wait.JitterUntil(func() {
+		count, err := e.Storage.Count(prefix)
+		if err != nil {
+			glog.V(10).Infof("Failed to update storage count metric: %v", err)


This log level would include client calls. You probably want 5, just before the client calls

deads2k · 2018-02-21T19:24:14Z

staging/src/k8s.io/apiserver/pkg/server/options/etcd.go

@@ -155,6 +156,8 @@ func (s *EtcdOptions) AddFlags(fs *pflag.FlagSet) {
 	fs.DurationVar(&s.StorageConfig.CompactionInterval, "etcd-compaction-interval", s.StorageConfig.CompactionInterval,
 		"The interval of compaction requests. If 0, the compaction request from apiserver is disabled.")

+	fs.DurationVar(&s.StorageConfig.CountMetricPollPeriod, "etcd-count-metric-poll-period", time.Minute, ""+


instead of time.Minute using s.StorageConfig.CountMetricPollPeriod allows overriding defaults like the other options above

And you can still set a default in NewEtcdOptions above

It's not perfect, but done.

deads2k · 2018-02-21T19:27:28Z

A few comments. It looks good to me otherwise. The etcd2 thing is unfortunate, but I don't think it will be common and the retry shouldn't hot loop

xiang90 · 2018-02-21T21:08:04Z

staging/src/k8s.io/apiserver/pkg/storage/etcd3/store.go

@@ -412,6 +412,15 @@ func (s *store) GetToList(ctx context.Context, key string, resourceVersion strin
 	return s.versioner.UpdateList(listObj, uint64(getResp.Header.Revision), "")
 }

+func (s *store) Count(key string) (int64, error) {
+	key = path.Join(s.pathPrefix, key)
+	getResp, err := s.client.KV.Get(context.Background(), key, clientv3.WithRange(clientv3.GetPrefixRangeEnd(key)), clientv3.WithCountOnly())


we should not issue large range to etcd. just to count the keys, we should use keys only option with limit and maybe also with pagination.

is WithCountOnly not more efficient than WithKeysOnly?

oh. i did not see count only is already there. but still if we know the count is going to be huge, probably need pagination as well. etcd does not maintain a index for count(*), but scan everything internally.

doing a micro-benchmark could be helpful. if count 1M keys take <1s, it might be ok.

If we would have 1M keys, we would have other problems than this metric...

counting 1M keys just to make the test result more reliable and safer. you can divide that by 10 to get 100k for an upper bound, etc.. periodically blocking etcd for more than 50ms is not good in general (for node kinds there can be 5k, for pods there can be 50k). i have not really looked into this pr. so just some random suggestions and thoughts on this.

@xiang90

I can run benchmarks after the freeze, as it's not enough time now. It shouldn't be an issue, as it's a 1-word change to disable this functionality.

As for pagination I'm not sure I follow. I believe that etcd-side 'WithLimit' restricts response size (i.e. number of items returned), which shouldn't apply to "count only" requests, as they always return single value. Or is the 'countOnly' implemented on the client side, thus "limit" will effectively cap the returned value to, well, limit?

Or maybe you meant doing some kind of manual pagination by dividing the range into multiple subranges, querying them separately and combining the result?

gmarek · 2018-02-23T13:28:36Z

@liggitt @deads2k - can we get this in and iterate on performance if needed afterwards?

liggitt · 2018-02-23T13:41:31Z

/lgtm

k8s-ci-robot · 2018-02-23T13:41:39Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: gmarek, liggitt, wojtek-t

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~cmd/kube-apiserver/OWNERS~~ [liggitt,wojtek-t]
~~staging/src/k8s.io/apiserver/OWNERS~~ [liggitt,wojtek-t]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

k8s-github-robot · 2018-02-24T05:21:42Z

Automatic merge from submit-queue (batch tested with PRs 57672, 60299, 59757, 60283, 60265). If you want to cherry-pick this change to another branch, please follow the instructions here.

smarterclayton · 2018-02-27T03:30:13Z

Note that this will not work correctly if we start separating resources into multiple keys, nor if there are other keys interspersed into a range that can't be accessed by name. I would have really preferred something that just remembered the last full list or checked the watch cache periodically.

liggitt · 2018-02-27T03:40:38Z

I would have really preferred something that just remembered the last full list or checked the watch cache periodically.

If you feel strongly we can revert

wojtek-t · 2018-02-27T07:23:05Z

I would have really preferred something that just remembered the last full list or checked the watch cache periodically.

The solution with watch cache was also my first thought, but it requires having watch cache enabled. And we were already discussing in the past enabling it only for some resources (and e.g for events it's not even enabled now).
The problem with remembering last list result is that they may not happen frequently (example: secrets - I think nothing is really listing all secrets).

That said, your argument is also very good one. Especially, since I may try to attack the problem with "making heartbeats cheaper (on etcd side)" by separating "hearbeat" part of object into separate thing in etcd (leaving the api as is).

I don't have any better suggestion now though...

k8s-ci-robot requested review from caesarxuchao, justinsb, smarterclayton, tallclair and wojtek-t February 12, 2018 16:00

liggitt reviewed Feb 12, 2018

View reviewed changes

gmarek force-pushed the object-count branch from ca47181 to f968903 Compare February 14, 2018 16:17

liggitt reviewed Feb 14, 2018

View reviewed changes

k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Feb 14, 2018

liggitt reviewed Feb 14, 2018

View reviewed changes

k8s-ci-robot added sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/scalability Categorizes an issue or PR as relevant to SIG Scalability. labels Feb 14, 2018

gmarek force-pushed the object-count branch from f968903 to 04ffe25 Compare February 15, 2018 11:47

gmarek force-pushed the object-count branch 2 times, most recently from 4eb8d26 to f014072 Compare February 15, 2018 12:33

gmarek changed the title ~~WIP: add a metric exposing number of objects per type~~ Add a metric exposing number of objects per type Feb 15, 2018

k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 15, 2018

gmarek assigned liggitt Feb 15, 2018

sttts reviewed Feb 19, 2018

View reviewed changes

gmarek force-pushed the object-count branch 2 times, most recently from a489e9c to ae1fe00 Compare February 19, 2018 13:11

wojtek-t reviewed Feb 21, 2018

View reviewed changes

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 21, 2018

gmarek force-pushed the object-count branch from ae1fe00 to f7899a4 Compare February 21, 2018 10:49

deads2k reviewed Feb 21, 2018

View reviewed changes

xiang90 reviewed Feb 21, 2018

View reviewed changes

gmarek force-pushed the object-count branch 2 times, most recently from ee2f49b to 161c025 Compare February 23, 2018 09:02

Marek Grabowski added 2 commits February 23, 2018 12:33

Add a metric exposing number of objects per type

f6e9ebf

generated

fb7101e

gmarek force-pushed the object-count branch from 161c025 to fb7101e Compare February 23, 2018 12:33

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 23, 2018

k8s-github-robot merged commit e3e954a into kubernetes:master Feb 24, 2018

Add a metric exposing number of objects per type #59757

Add a metric exposing number of objects per type #59757

Conversation

gmarek commented Feb 12, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

liggitt Feb 12, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

liggitt commented Feb 14, 2018

gmarek commented Feb 15, 2018

gmarek commented Feb 15, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gmarek commented Feb 19, 2018

gmarek commented Feb 20, 2018

wojtek-t left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gmarek commented Feb 21, 2018

liggitt commented Feb 21, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

deads2k commented Feb 21, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

xiang90 Feb 22, 2018 • edited Loading

Choose a reason for hiding this comment

gmarek Feb 22, 2018 • edited Loading

Choose a reason for hiding this comment

gmarek commented Feb 23, 2018

liggitt commented Feb 23, 2018

k8s-ci-robot commented Feb 23, 2018

k8s-github-robot commented Feb 24, 2018

smarterclayton commented Feb 27, 2018 • edited Loading

liggitt commented Feb 27, 2018

wojtek-t commented Feb 27, 2018

gmarek commented Feb 12, 2018 •

edited

Loading

liggitt Feb 12, 2018 •

edited

Loading

xiang90 Feb 22, 2018 •

edited

Loading

gmarek Feb 22, 2018 •

edited

Loading

smarterclayton commented Feb 27, 2018 •

edited

Loading