Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add call to estimate number of samples in a chunk to the API #2168

Merged
merged 1 commit into from
Nov 18, 2016

Conversation

tomwilkie
Copy link
Member

@tomwilkie tomwilkie commented Nov 6, 2016

Want this for metrics in Cortex (nee Frankenstein)

For (double) delta this is exact, but for varbit its an under-estimate IIUC.

@tomwilkie
Copy link
Member Author

@beorn7 mind taking a look at this please?

@beorn7
Copy link
Member

beorn7 commented Nov 16, 2016

Sorry, I thought this was meant for @juliusv . It's on my review queue now.

@tomwilkie
Copy link
Member Author

Thanks!

return 1
}

return float64(int(offset)-varbitFirstValueDeltaOffset) / float64(varbitWorstCaseBitsPerSample[c.valueEncoding()])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That more a lower limit of chunks than an estimate. I expect this to be wildly off most of the time as the timestamps compress most reliably, and they are always counted with 27 bits here.

@beorn7
Copy link
Member

beorn7 commented Nov 16, 2016

I'm wondering about the "must run in constant time" thing.

A chunk being written to could just keep counting the number of samples. That would be no noticeable overhead compared to the usual encoding efforts. A chunk that has just been loaded needed to iterate through all samples on the first call of Len but could then remember and update the number for as long as it is loaded.

Would that fulfill the requirements of your use case?

I don't think the Len reported by varbit chunks is very useful with the current implementation, see comment there.

@beorn7
Copy link
Member

beorn7 commented Nov 16, 2016

Problem would be all the still open chunks coming from loading the checkpoint. They would all need to iterate through to find out about their length.

@tomwilkie
Copy link
Member Author

It would. We could lazily initialise the length field to get round the
problem? WDYT?

For cortex (nee Frankenstein) we monitor samples/chunk as it's the key
metric in optioning and managing costs. So the estimate as is would not be
good enough, no.
On Wed, 16 Nov 2016 at 18:01, Björn Rabenstein notifications@github.com
wrote:

Problem would be all the still open chunks coming from loading the
checkpoint. They would all need to iterate through to find out about their
length.


You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
#2168 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAbGhWYsaZ46qJ0VZ8Xv1V_H9etXJJ8Wks5q-0UGgaJpZM4Kqleg
.

@juliusv
Copy link
Member

juliusv commented Nov 17, 2016

It's annoying that we're having to think about code interactions that will not happen in practice, but should still be taken care of to keep the code "correct": Prometheus itself is not using Len() (thus doesn't care about those methods reporting correct values in any case). Cortex on the other hand is not loading existing chunks from disk (thus wouldn't care about repopulating chunk lengths when loading from disk). Further, Cortex is not even using varbit chunks... so now we're going to be adding significant code that neither system will use.

What are other options?

  • panic("not implemented") in the varbit implementation?
  • track the chunk lengths ourselves somehow in Cortex?

@beorn7
Copy link
Member

beorn7 commented Nov 17, 2016

In general, I like the feature (as long as it doesn't inflict any noticeable cost during normal operation). I was playing with some analysis options in storagetool to find out more about a cold storage…
But such a feature didn't need constant runtime, so it could just iterate through a chunk and count.

If Cortex really doesn't use varbit chunks, I'd tend towards implementing Len without guaranteed constant runtime, and then document the runtime for each implementation. (I'd read a panic("not implemented") as a TODO. Better have a working but slow implementation.)

@tomwilkie
Copy link
Member Author

Cortex is not even using varbit chunks...

Well, thats about to change :-) A motivation for this PR is doing tuning on cortex to manage cost, and we're hoping switching to varbit will help in this regard. So its kinda related.

The constant runtime thing is actually not important - we only sample this value when flushing chunks. So perhaps lets drop that?

@beorn7
Copy link
Member

beorn7 commented Nov 17, 2016

Yeah, let's have a naive implementation for now. If it doesn't serve your purpose, we can reconsider.

@tomwilkie
Copy link
Member Author

@beorn7 simplest thing, with a test. Let me know what you think.

for _, c := range chunks {
for i := 0; i <= 10; i++ {
if c.Len() != i {
t.Errorf("empty chunk type %s should have %d samples, had %d", c.Encoding(), i, c.Len())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Empty? I guess that words needs to go.

t.Errorf("empty chunk type %s should have %d samples, had %d", c.Encoding(), i, c.Len())
}

cs, _ := c.Add(model.SamplePair{model.Time(i), model.SampleValue(i)})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For clarity in case it ever happens, I'd error out if an err!=nil is returned, and likewise if more than one chunk is returned.

@@ -276,6 +276,7 @@ type Chunk interface {
UnmarshalFromBuf([]byte) error
Encoding() Encoding
Utilization() float64
Len() int
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it deserves documentation that it returns the number of chunks (and not the length in byte or something), and that the implementation might be expensize.

(And while you are on it, the "add" above in the doc comment needs to be capitalized.)

@@ -296,7 +296,7 @@ func (c deltaEncodedChunk) sampleSize() int {
return int(c.timeBytes() + c.valueBytes())
}

func (c deltaEncodedChunk) len() int {
func (c deltaEncodedChunk) Len() int {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doc comment that it implements chunk and that it is constant runtime.

@@ -336,7 +336,7 @@ func (c doubleDeltaEncodedChunk) sampleSize() int {
return int(c.timeBytes() + c.valueBytes())
}

func (c doubleDeltaEncodedChunk) len() int {
func (c doubleDeltaEncodedChunk) Len() int {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doc comment that it implements chunk and that it is constant runtime.

@@ -328,6 +328,15 @@ func (c varbitChunk) Utilization() float64 {
return math.Min(float64(c.nextSampleOffset()/8+15)/float64(cap(c)), 1)
}

// Len implements chunk.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps note that this has O(n) runtime.

// Len implements chunk.
func (c varbitChunk) Len() int {
it := c.NewIterator()
var i = 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We usually write this as
i := 0

@beorn7
Copy link
Member

beorn7 commented Nov 17, 2016

Looks good. Just nits left.

@tomwilkie
Copy link
Member Author

Done & squashed. Thanks for feedback @beorn7!

@juliusv
Copy link
Member

juliusv commented Nov 18, 2016

Alright alright, then we do it this way!

@juliusv juliusv merged commit 127332c into prometheus:master Nov 18, 2016
@juliusv juliusv deleted the chunk-len branch November 18, 2016 07:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants