-
Notifications
You must be signed in to change notification settings - Fork 40k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a metric exposing number of objects per type #59757
Conversation
@@ -284,6 +289,20 @@ func (s *DefaultStorageFactory) NewConfig(groupResource schema.GroupResource) (* | |||
} | |||
glog.V(3).Infof("storing %v in %v, reading as %v from %#v", groupResource, codecConfig.StorageVersion, codecConfig.MemoryVersion, codecConfig.Config) | |||
|
|||
if storageConfig.Type == storagebackend.StorageTypeETCD3 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
really not a fan of leaking the storage implementation here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can just always fill those details here. It's generic enough that it should be possible to implement in arbitrary backend.
storageConfig.ResourcesToMonitor = make(map[string]string) | ||
} | ||
resourcePrefix := s.ResourcePrefix(groupResource) | ||
path := []string{"/registry"} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this sets up a duplicate prefix construction path that doesn't honor the configured etcd prefix (--etcd-prefix
), and assumes nesting under the API group name, which isn't a valid assumption
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prefix is easily fixable, I'm not sure about the api group name - why it's not valid (except core/v1 ones)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
none of the kube-apiserver types place themselves under group subpaths. see test/integration/etcd/etcd_storage_path_test.go for all the various non-standard key paths used.
I don't think we should set up another place like this where we try to pull together all the etcd prefix + resource prefix config values... if we want to run metrics on storage, why not do it inside generic store and storage, where this information already lives?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was looking into that yesterday, but got pulled into different things before I was able to get to the bottom of this. Short answer is that I have no idea how the storage code is structured.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The way I understand it is that Storage (the layer that talks to etcd and the place where monitoring needs to live) is created before generic.Store(s) and get injected into Stores by something (Decorator)?.
So I can either modify the injection logic, to also pass some data down to Storage (which does have an unpleasant smell to it), or add a simple "Count" function to Storage. I'd rather do the latter, but I don't really want to make it full blown API-like request.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue with the latter is that I have to either provide EXTREMELY inefficient implementations for etcdv2 and Cacher, or just always return -1/unimplemented error. Neither is particularly nice...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or add a simple "Count" function to Storage. I'd rather do the latter, but I don't really want to make it full blown API-like request
That is the most accurate representation of what this is attempting to do. If we want to pull info out of etcd in a new way, the Storage interface is the correct place to represent that.
I have to either provide EXTREMELY inefficient implementations for etcdv2 and Cacher
Etcd2 returning an unimplemented error doesn't particularly bother me. Cacher could delegate to underlying store (as it does for most things), or add a KeyCount()
function to the cache.Store
interface
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is what I'll try to send out today, assuming I'll stop getting interrupted...
@@ -155,6 +156,8 @@ func (s *EtcdOptions) AddFlags(fs *pflag.FlagSet) { | |||
fs.DurationVar(&s.StorageConfig.CompactionInterval, "etcd-compaction-interval", s.StorageConfig.CompactionInterval, | |||
"The interval of compaction requests. If 0, the compaction request from apiserver is disabled.") | |||
|
|||
fs.DurationVar(&s.StorageConfig.CountMetricPollPeriod, "etcd-count-metric-poll-period", 5*time.Second, ""+ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know how expensive a call this is, but counting every resource type every 5 seconds is more frequent than I expected. who would be a good resource to ask about this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can put anything you prefer here. As for resources to know I guess someone from etcd team - maybe @xiang90 still works on that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I increased it to one minute interval. It should be good enough, but we'll see how scale tests will behave. @shyamjvs @kubernetes/sig-scalability-misc
objectCounts = prometheus.NewGaugeVec( | ||
prometheus.GaugeOpts{ | ||
Name: "apiserver_object_counts", | ||
Help: "Current number of objects stored. Counts only types that have watch-cache enabled.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the comment is accurate... looks like it would work without the watch cache, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right.
@@ -1365,9 +1367,26 @@ func (e *Store) CompleteWithOptions(options *generic.StoreOptions) error { | |||
) | |||
} | |||
|
|||
if opts.CountMetricPollPeriod > 0 { | |||
e.startObservingCount(e.DefaultQualifiedResource.String(), opts.StorageConfig.Prefix+prefix, opts.CountMetricPollPeriod) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't run forever, make sure we have a stop channel and non-nil Destroy() func that will stop this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for example:
stopMetricsCh := make(chan struct{})
destroy := e.DestroyFunc
e.DestroyFunc = func(){
close(stopMetricsCh)
if destroy != nil {
destroy()
}
}
e.startObservingCount(..., stopMetricsCh)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't join opts.StorageConfig.Prefix+prefix
here... the storage is responsible for dealing with prepending opts.StorageConfig.Prefix
also, don't assume prefix
is the key... call e.KeyRootFunc()
to get the key
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I started to return a destroy function from 'start...', which is functionally equivalent.
return nil | ||
} | ||
|
||
func (e *Store) startObservingCount(resourceName, pathPrefix string, period time.Duration) { | ||
glog.V(2).Infof("Monitoring %v bound to %v", resourceName, pathPrefix) | ||
go func() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
simplify to go wait.Forever(...
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@@ -85,6 +85,14 @@ type objState struct { | |||
stale bool | |||
} | |||
|
|||
func (s *store) Count(pathPrefix string) (int64, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd expect the key parameter to match that passed to GetToList(), etc, and get joined with the pathPrefix already held by store
func (s *store) Count(key string) (int64, error) {
key = path.Join(s.pathPrefix, key)
getResp, err := s.client.KV.Get(context.Background(), key, ...
...
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
cc @kubernetes/sig-api-machinery-pr-reviews |
/test pull-kubernetes-kubemark-e2e-gce-big |
4eb8d26
to
f014072
Compare
/test pull-kubernetes-kubemark-e2e-gce-big |
return nil | ||
} | ||
|
||
// startObservingCount starts monitoring given prefix and periodically updating metrics. It return a function to stop collection. | ||
func (e *Store) startObservingCount(period time.Duration) func() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens for co-habitation? Will we get two monitors?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know storage code good enough to answer that. It depends on how co-habitation is implemented and whether two Stores for co-habitating resources share the same Storage object, or control will enter the if Storage == nil
branch twice.
Having two monitors is not ideal, but it's also not a disaster IMO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree. We can live with that.
return nil | ||
} | ||
|
||
// startObservingCount starts monitoring given prefix and periodically updating metrics. It return a function to stop collection. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: returns.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@@ -586,6 +586,10 @@ func (h *etcdHelper) GuaranteedUpdate( | |||
} | |||
} | |||
|
|||
func (*etcdHelper) Count(pathPerfix string) (int64, error) { | |||
return 0, fmt.Errorf("Count is unimplemented for etcd2!") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this an error? If we want to distinguish between error and "not implement" we should have at least a special pre-defined error type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, but I guess we'd need to add it to apimachinery/pkg/api/errors
to be consistently used. I can add it in separate PR, as I expect a lot of discussions around that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternative: convention that -1 means "not implemented"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-1 currently means "no data", as per @liggitt request. Also we're not writing in C to do such things;)
a489e9c
to
ae1fe00
Compare
/test pull-kubernetes-kubemark-e2e-gce-big |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/approve no-issue
objectCounts = prometheus.NewGaugeVec( | ||
prometheus.GaugeOpts{ | ||
Name: "etcd_object_counts", | ||
Help: "Current number of stored objects split by kind.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not "Current". It's number of objects from some period of time in the last minute (or whetever user will say). May be worth being explicit...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will change it, but frankly no metric is ever "current" in the strict sense:)
go wait.JitterUntil(func() { | ||
count, err := e.Storage.Count(prefix) | ||
if err != nil { | ||
glog.V(10).Infof("Failed to update storage count metric: %v", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This log level would include client calls. You probably want 5, just before the client calls
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@@ -155,6 +156,8 @@ func (s *EtcdOptions) AddFlags(fs *pflag.FlagSet) { | |||
fs.DurationVar(&s.StorageConfig.CompactionInterval, "etcd-compaction-interval", s.StorageConfig.CompactionInterval, | |||
"The interval of compaction requests. If 0, the compaction request from apiserver is disabled.") | |||
|
|||
fs.DurationVar(&s.StorageConfig.CountMetricPollPeriod, "etcd-count-metric-poll-period", time.Minute, ""+ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of time.Minute
using s.StorageConfig.CountMetricPollPeriod
allows overriding defaults like the other options above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And you can still set a default in NewEtcdOptions
above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not perfect, but done.
A few comments. It looks good to me otherwise. The etcd2 thing is unfortunate, but I don't think it will be common and the retry shouldn't hot loop |
@@ -412,6 +412,15 @@ func (s *store) GetToList(ctx context.Context, key string, resourceVersion strin | |||
return s.versioner.UpdateList(listObj, uint64(getResp.Header.Revision), "") | |||
} | |||
|
|||
func (s *store) Count(key string) (int64, error) { | |||
key = path.Join(s.pathPrefix, key) | |||
getResp, err := s.client.KV.Get(context.Background(), key, clientv3.WithRange(clientv3.GetPrefixRangeEnd(key)), clientv3.WithCountOnly()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should not issue large range to etcd. just to count the keys, we should use keys only option with limit and maybe also with pagination.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is WithCountOnly
not more efficient than WithKeysOnly
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh. i did not see count only is already there. but still if we know the count is going to be huge, probably need pagination as well. etcd does not maintain a index for count(*), but scan everything internally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doing a micro-benchmark could be helpful. if count 1M keys take <1s, it might be ok.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we would have 1M keys, we would have other problems than this metric...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
counting 1M keys just to make the test result more reliable and safer. you can divide that by 10 to get 100k for an upper bound, etc.. periodically blocking etcd for more than 50ms is not good in general (for node kinds there can be 5k, for pods there can be 50k). i have not really looked into this pr. so just some random suggestions and thoughts on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can run benchmarks after the freeze, as it's not enough time now. It shouldn't be an issue, as it's a 1-word change to disable this functionality.
As for pagination I'm not sure I follow. I believe that etcd-side 'WithLimit' restricts response size (i.e. number of items returned), which shouldn't apply to "count only" requests, as they always return single value. Or is the 'countOnly' implemented on the client side, thus "limit" will effectively cap the returned value to, well, limit?
Or maybe you meant doing some kind of manual pagination by dividing the range into multiple subranges, querying them separately and combining the result?
ee2f49b
to
161c025
Compare
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: gmarek, liggitt, wojtek-t The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Automatic merge from submit-queue (batch tested with PRs 57672, 60299, 59757, 60283, 60265). If you want to cherry-pick this change to another branch, please follow the instructions here. |
Note that this will not work correctly if we start separating resources into multiple keys, nor if there are other keys interspersed into a range that can't be accessed by name. I would have really preferred something that just remembered the last full list or checked the watch cache periodically. |
If you feel strongly we can revert |
The solution with watch cache was also my first thought, but it requires having watch cache enabled. And we were already discussing in the past enabling it only for some resources (and e.g for events it's not even enabled now). That said, your argument is also very good one. Especially, since I may try to attack the problem with "making heartbeats cheaper (on etcd side)" by separating "hearbeat" part of object into separate thing in etcd (leaving the api as is). I don't have any better suggestion now though... |
Fix #51953
Adds a goroutine that periodically checks the count of objects in etcd and publishes a metric with this data.