-
Notifications
You must be signed in to change notification settings - Fork 9.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add call to estimate number of samples in a chunk to the API #2168
Conversation
@beorn7 mind taking a look at this please? |
Sorry, I thought this was meant for @juliusv . It's on my review queue now. |
Thanks! |
return 1 | ||
} | ||
|
||
return float64(int(offset)-varbitFirstValueDeltaOffset) / float64(varbitWorstCaseBitsPerSample[c.valueEncoding()]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That more a lower limit of chunks than an estimate. I expect this to be wildly off most of the time as the timestamps compress most reliably, and they are always counted with 27 bits here.
I'm wondering about the "must run in constant time" thing. A chunk being written to could just keep counting the number of samples. That would be no noticeable overhead compared to the usual encoding efforts. A chunk that has just been loaded needed to iterate through all samples on the first call of Would that fulfill the requirements of your use case? I don't think the Len reported by varbit chunks is very useful with the current implementation, see comment there. |
Problem would be all the still open chunks coming from loading the checkpoint. They would all need to iterate through to find out about their length. |
It would. We could lazily initialise the length field to get round the For cortex (nee Frankenstein) we monitor samples/chunk as it's the key
|
It's annoying that we're having to think about code interactions that will not happen in practice, but should still be taken care of to keep the code "correct": Prometheus itself is not using What are other options?
|
In general, I like the feature (as long as it doesn't inflict any noticeable cost during normal operation). I was playing with some analysis options in If Cortex really doesn't use varbit chunks, I'd tend towards implementing |
Well, thats about to change :-) A motivation for this PR is doing tuning on cortex to manage cost, and we're hoping switching to varbit will help in this regard. So its kinda related. The constant runtime thing is actually not important - we only sample this value when flushing chunks. So perhaps lets drop that? |
Yeah, let's have a naive implementation for now. If it doesn't serve your purpose, we can reconsider. |
@beorn7 simplest thing, with a test. Let me know what you think. |
for _, c := range chunks { | ||
for i := 0; i <= 10; i++ { | ||
if c.Len() != i { | ||
t.Errorf("empty chunk type %s should have %d samples, had %d", c.Encoding(), i, c.Len()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Empty? I guess that words needs to go.
t.Errorf("empty chunk type %s should have %d samples, had %d", c.Encoding(), i, c.Len()) | ||
} | ||
|
||
cs, _ := c.Add(model.SamplePair{model.Time(i), model.SampleValue(i)}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For clarity in case it ever happens, I'd error out if an err!=nil is returned, and likewise if more than one chunk is returned.
@@ -276,6 +276,7 @@ type Chunk interface { | |||
UnmarshalFromBuf([]byte) error | |||
Encoding() Encoding | |||
Utilization() float64 | |||
Len() int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it deserves documentation that it returns the number of chunks (and not the length in byte or something), and that the implementation might be expensize.
(And while you are on it, the "add" above in the doc comment needs to be capitalized.)
@@ -296,7 +296,7 @@ func (c deltaEncodedChunk) sampleSize() int { | |||
return int(c.timeBytes() + c.valueBytes()) | |||
} | |||
|
|||
func (c deltaEncodedChunk) len() int { | |||
func (c deltaEncodedChunk) Len() int { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doc comment that it implements chunk and that it is constant runtime.
@@ -336,7 +336,7 @@ func (c doubleDeltaEncodedChunk) sampleSize() int { | |||
return int(c.timeBytes() + c.valueBytes()) | |||
} | |||
|
|||
func (c doubleDeltaEncodedChunk) len() int { | |||
func (c doubleDeltaEncodedChunk) Len() int { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doc comment that it implements chunk and that it is constant runtime.
@@ -328,6 +328,15 @@ func (c varbitChunk) Utilization() float64 { | |||
return math.Min(float64(c.nextSampleOffset()/8+15)/float64(cap(c)), 1) | |||
} | |||
|
|||
// Len implements chunk. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps note that this has O(n) runtime.
// Len implements chunk. | ||
func (c varbitChunk) Len() int { | ||
it := c.NewIterator() | ||
var i = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We usually write this as
i := 0
Looks good. Just nits left. |
Done & squashed. Thanks for feedback @beorn7! |
Alright alright, then we do it this way! |
Want this for metrics in Cortex (nee Frankenstein)
For (double) delta this is exact, but for varbit its an under-estimate IIUC.