Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a metric exposing number of objects per type #59757

Merged
merged 2 commits into from
Feb 24, 2018

Conversation

gmarek
Copy link
Contributor

@gmarek gmarek commented Feb 12, 2018

Fix #51953

Adds a goroutine that periodically checks the count of objects in etcd and publishes a metric with this data.

APIserver backed by etcdv3 exports metric showing number of resources per kind

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Feb 12, 2018
@@ -284,6 +289,20 @@ func (s *DefaultStorageFactory) NewConfig(groupResource schema.GroupResource) (*
}
glog.V(3).Infof("storing %v in %v, reading as %v from %#v", groupResource, codecConfig.StorageVersion, codecConfig.MemoryVersion, codecConfig.Config)

if storageConfig.Type == storagebackend.StorageTypeETCD3 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

really not a fan of leaking the storage implementation here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can just always fill those details here. It's generic enough that it should be possible to implement in arbitrary backend.

storageConfig.ResourcesToMonitor = make(map[string]string)
}
resourcePrefix := s.ResourcePrefix(groupResource)
path := []string{"/registry"}
Copy link
Member

@liggitt liggitt Feb 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this sets up a duplicate prefix construction path that doesn't honor the configured etcd prefix (--etcd-prefix), and assumes nesting under the API group name, which isn't a valid assumption

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefix is easily fixable, I'm not sure about the api group name - why it's not valid (except core/v1 ones)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

none of the kube-apiserver types place themselves under group subpaths. see test/integration/etcd/etcd_storage_path_test.go for all the various non-standard key paths used.

I don't think we should set up another place like this where we try to pull together all the etcd prefix + resource prefix config values... if we want to run metrics on storage, why not do it inside generic store and storage, where this information already lives?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was looking into that yesterday, but got pulled into different things before I was able to get to the bottom of this. Short answer is that I have no idea how the storage code is structured.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way I understand it is that Storage (the layer that talks to etcd and the place where monitoring needs to live) is created before generic.Store(s) and get injected into Stores by something (Decorator)?.

So I can either modify the injection logic, to also pass some data down to Storage (which does have an unpleasant smell to it), or add a simple "Count" function to Storage. I'd rather do the latter, but I don't really want to make it full blown API-like request.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue with the latter is that I have to either provide EXTREMELY inefficient implementations for etcdv2 and Cacher, or just always return -1/unimplemented error. Neither is particularly nice...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or add a simple "Count" function to Storage. I'd rather do the latter, but I don't really want to make it full blown API-like request

That is the most accurate representation of what this is attempting to do. If we want to pull info out of etcd in a new way, the Storage interface is the correct place to represent that.

I have to either provide EXTREMELY inefficient implementations for etcdv2 and Cacher

Etcd2 returning an unimplemented error doesn't particularly bother me. Cacher could delegate to underlying store (as it does for most things), or add a KeyCount() function to the cache.Store interface

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what I'll try to send out today, assuming I'll stop getting interrupted...

@@ -155,6 +156,8 @@ func (s *EtcdOptions) AddFlags(fs *pflag.FlagSet) {
fs.DurationVar(&s.StorageConfig.CompactionInterval, "etcd-compaction-interval", s.StorageConfig.CompactionInterval,
"The interval of compaction requests. If 0, the compaction request from apiserver is disabled.")

fs.DurationVar(&s.StorageConfig.CountMetricPollPeriod, "etcd-count-metric-poll-period", 5*time.Second, ""+
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know how expensive a call this is, but counting every resource type every 5 seconds is more frequent than I expected. who would be a good resource to ask about this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can put anything you prefer here. As for resources to know I guess someone from etcd team - maybe @xiang90 still works on that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I increased it to one minute interval. It should be good enough, but we'll see how scale tests will behave. @shyamjvs @kubernetes/sig-scalability-misc

objectCounts = prometheus.NewGaugeVec(
prometheus.GaugeOpts{
Name: "apiserver_object_counts",
Help: "Current number of objects stored. Counts only types that have watch-cache enabled.",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the comment is accurate... looks like it would work without the watch cache, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right.

@@ -1365,9 +1367,26 @@ func (e *Store) CompleteWithOptions(options *generic.StoreOptions) error {
)
}

if opts.CountMetricPollPeriod > 0 {
e.startObservingCount(e.DefaultQualifiedResource.String(), opts.StorageConfig.Prefix+prefix, opts.CountMetricPollPeriod)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't run forever, make sure we have a stop channel and non-nil Destroy() func that will stop this

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for example:

stopMetricsCh := make(chan struct{})
destroy := e.DestroyFunc
e.DestroyFunc = func(){
  close(stopMetricsCh)
  if destroy != nil {
    destroy()
  }
}

e.startObservingCount(..., stopMetricsCh)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't join opts.StorageConfig.Prefix+prefix here... the storage is responsible for dealing with prepending opts.StorageConfig.Prefix

also, don't assume prefix is the key... call e.KeyRootFunc() to get the key

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I started to return a destroy function from 'start...', which is functionally equivalent.

return nil
}

func (e *Store) startObservingCount(resourceName, pathPrefix string, period time.Duration) {
glog.V(2).Infof("Monitoring %v bound to %v", resourceName, pathPrefix)
go func() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

simplify to go wait.Forever(...?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Feb 14, 2018
@@ -85,6 +85,14 @@ type objState struct {
stale bool
}

func (s *store) Count(pathPrefix string) (int64, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd expect the key parameter to match that passed to GetToList(), etc, and get joined with the pathPrefix already held by store

func (s *store) Count(key string) (int64, error) {
  key = path.Join(s.pathPrefix, key)
  getResp, err := s.client.KV.Get(context.Background(), key, ...
  ...
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@liggitt
Copy link
Member

liggitt commented Feb 14, 2018

cc @kubernetes/sig-api-machinery-pr-reviews

@k8s-ci-robot k8s-ci-robot added sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/scalability Categorizes an issue or PR as relevant to SIG Scalability. labels Feb 14, 2018
@gmarek
Copy link
Contributor Author

gmarek commented Feb 15, 2018

/test pull-kubernetes-kubemark-e2e-gce-big

@gmarek gmarek force-pushed the object-count branch 2 times, most recently from 4eb8d26 to f014072 Compare February 15, 2018 12:33
@gmarek
Copy link
Contributor Author

gmarek commented Feb 15, 2018

/test pull-kubernetes-kubemark-e2e-gce-big

@gmarek gmarek changed the title WIP: add a metric exposing number of objects per type Add a metric exposing number of objects per type Feb 15, 2018
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 15, 2018
return nil
}

// startObservingCount starts monitoring given prefix and periodically updating metrics. It return a function to stop collection.
func (e *Store) startObservingCount(period time.Duration) func() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens for co-habitation? Will we get two monitors?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know storage code good enough to answer that. It depends on how co-habitation is implemented and whether two Stores for co-habitating resources share the same Storage object, or control will enter the if Storage == nilbranch twice.

Having two monitors is not ideal, but it's also not a disaster IMO.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. We can live with that.

return nil
}

// startObservingCount starts monitoring given prefix and periodically updating metrics. It return a function to stop collection.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: returns.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -586,6 +586,10 @@ func (h *etcdHelper) GuaranteedUpdate(
}
}

func (*etcdHelper) Count(pathPerfix string) (int64, error) {
return 0, fmt.Errorf("Count is unimplemented for etcd2!")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this an error? If we want to distinguish between error and "not implement" we should have at least a special pre-defined error type.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, but I guess we'd need to add it to apimachinery/pkg/api/errors to be consistently used. I can add it in separate PR, as I expect a lot of discussions around that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternative: convention that -1 means "not implemented"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-1 currently means "no data", as per @liggitt request. Also we're not writing in C to do such things;)

@gmarek gmarek force-pushed the object-count branch 2 times, most recently from a489e9c to ae1fe00 Compare February 19, 2018 13:11
@gmarek
Copy link
Contributor Author

gmarek commented Feb 19, 2018

/test pull-kubernetes-kubemark-e2e-gce-big

@gmarek
Copy link
Contributor Author

gmarek commented Feb 20, 2018

@sttts @liggitt - please let me know what's necessary to get this PR in before the freeze. kubemark-big test passed, so it's not impacting the performance too much.

Copy link
Member

@wojtek-t wojtek-t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve no-issue

objectCounts = prometheus.NewGaugeVec(
prometheus.GaugeOpts{
Name: "etcd_object_counts",
Help: "Current number of stored objects split by kind.",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not "Current". It's number of objects from some period of time in the last minute (or whetever user will say). May be worth being explicit...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will change it, but frankly no metric is ever "current" in the strict sense:)

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 21, 2018
@gmarek
Copy link
Contributor Author

gmarek commented Feb 21, 2018

Ping @sttts @liggitt - I'd like to get this in before the freeze.

@liggitt
Copy link
Member

liggitt commented Feb 21, 2018

would like an ack from @lavalamp or @deads2k on the Storage interface addition, since it could impact api-machinery storage plans (though it seems a pretty reasonable thing to support to me)

go wait.JitterUntil(func() {
count, err := e.Storage.Count(prefix)
if err != nil {
glog.V(10).Infof("Failed to update storage count metric: %v", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This log level would include client calls. You probably want 5, just before the client calls

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -155,6 +156,8 @@ func (s *EtcdOptions) AddFlags(fs *pflag.FlagSet) {
fs.DurationVar(&s.StorageConfig.CompactionInterval, "etcd-compaction-interval", s.StorageConfig.CompactionInterval,
"The interval of compaction requests. If 0, the compaction request from apiserver is disabled.")

fs.DurationVar(&s.StorageConfig.CountMetricPollPeriod, "etcd-count-metric-poll-period", time.Minute, ""+
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of time.Minute using s.StorageConfig.CountMetricPollPeriod allows overriding defaults like the other options above

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And you can still set a default in NewEtcdOptions above

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not perfect, but done.

@deads2k
Copy link
Contributor

deads2k commented Feb 21, 2018

A few comments. It looks good to me otherwise. The etcd2 thing is unfortunate, but I don't think it will be common and the retry shouldn't hot loop

@@ -412,6 +412,15 @@ func (s *store) GetToList(ctx context.Context, key string, resourceVersion strin
return s.versioner.UpdateList(listObj, uint64(getResp.Header.Revision), "")
}

func (s *store) Count(key string) (int64, error) {
key = path.Join(s.pathPrefix, key)
getResp, err := s.client.KV.Get(context.Background(), key, clientv3.WithRange(clientv3.GetPrefixRangeEnd(key)), clientv3.WithCountOnly())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should not issue large range to etcd. just to count the keys, we should use keys only option with limit and maybe also with pagination.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is WithCountOnly not more efficient than WithKeysOnly?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh. i did not see count only is already there. but still if we know the count is going to be huge, probably need pagination as well. etcd does not maintain a index for count(*), but scan everything internally.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doing a micro-benchmark could be helpful. if count 1M keys take <1s, it might be ok.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we would have 1M keys, we would have other problems than this metric...

Copy link
Contributor

@xiang90 xiang90 Feb 22, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

counting 1M keys just to make the test result more reliable and safer. you can divide that by 10 to get 100k for an upper bound, etc.. periodically blocking etcd for more than 50ms is not good in general (for node kinds there can be 5k, for pods there can be 50k). i have not really looked into this pr. so just some random suggestions and thoughts on this.

Copy link
Contributor Author

@gmarek gmarek Feb 22, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xiang90

I can run benchmarks after the freeze, as it's not enough time now. It shouldn't be an issue, as it's a 1-word change to disable this functionality.

As for pagination I'm not sure I follow. I believe that etcd-side 'WithLimit' restricts response size (i.e. number of items returned), which shouldn't apply to "count only" requests, as they always return single value. Or is the 'countOnly' implemented on the client side, thus "limit" will effectively cap the returned value to, well, limit?

Or maybe you meant doing some kind of manual pagination by dividing the range into multiple subranges, querying them separately and combining the result?

@gmarek gmarek force-pushed the object-count branch 2 times, most recently from ee2f49b to 161c025 Compare February 23, 2018 09:02
@gmarek
Copy link
Contributor Author

gmarek commented Feb 23, 2018

@liggitt @deads2k - can we get this in and iterate on performance if needed afterwards?

@liggitt
Copy link
Member

liggitt commented Feb 23, 2018

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 23, 2018
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: gmarek, liggitt, wojtek-t

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-github-robot
Copy link

Automatic merge from submit-queue (batch tested with PRs 57672, 60299, 59757, 60283, 60265). If you want to cherry-pick this change to another branch, please follow the instructions here.

@k8s-github-robot k8s-github-robot merged commit e3e954a into kubernetes:master Feb 24, 2018
@smarterclayton
Copy link
Contributor

smarterclayton commented Feb 27, 2018

Note that this will not work correctly if we start separating resources into multiple keys, nor if there are other keys interspersed into a range that can't be accessed by name. I would have really preferred something that just remembered the last full list or checked the watch cache periodically.

@liggitt
Copy link
Member

liggitt commented Feb 27, 2018

I would have really preferred something that just remembered the last full list or checked the watch cache periodically.

If you feel strongly we can revert

@wojtek-t
Copy link
Member

I would have really preferred something that just remembered the last full list or checked the watch cache periodically.

The solution with watch cache was also my first thought, but it requires having watch cache enabled. And we were already discussing in the past enabling it only for some resources (and e.g for events it's not even enabled now).
The problem with remembering last list result is that they may not happen frequently (example: secrets - I think nothing is really listing all secrets).

That said, your argument is also very good one. Especially, since I may try to attack the problem with "making heartbeats cheaper (on etcd side)" by separating "hearbeat" part of object into separate thing in etcd (leaving the api as is).

I don't have any better suggestion now though...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/scalability Categorizes an issue or PR as relevant to SIG Scalability. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.