-
Notifications
You must be signed in to change notification settings - Fork 9.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement histogram statistics decoder #14097
Conversation
8ad58a7
to
4a461a5
Compare
This commit is a POC of how we can speed up histogram_count and histogram_sum functions on native histograms. The idea is to have separate decoders which can be used by the engine to only read count/sum values from histogram objects. This should help with reducing allocations when decoding histograms, as well as with speeding up aggregations like sum since they will be done on floats and not on histogram objects. Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>
4a461a5
to
d8061d5
Compare
I've been tinkering with this approach a bit and I wonder if it could interfere with various mixed types checks, as well as with histogram counter resets. I will investigate another solution which returns a partially filled histogram object so that we can still go through the same |
8df13fa
to
0a72cc5
Compare
Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>
0a72cc5
to
e6f9cd1
Compare
Looking at profiles, I see |
Generally, this looks very nice. However, there is a nasty edge case: In rare cases, a counter reset could not be seen in the count but only in individual buckets. We do detect those cases on ingestion and set the counter reset hint accordingly, so generally, this should make it through the special decoder here via the counter reset hint. However (edge case of an edge case of an edge case…), the following scenario could happen:
Maybe we can solve that cleanly by doing the counter reset detection directly in the iterator (and set the counter reset accordingly) in cases where the "unknown" hint is coming from TSDB. Or we can just wing it and claim that this case is so outrageously unlikely that we don't need to handle it. |
tsdb/chunkenc/histogram.go
Outdated
ZeroCount: f.hReader.ZeroCount, | ||
Sum: f.hReader.Sum, | ||
ZeroThreshold: f.hReader.ZeroThreshold, | ||
Schema: f.hReader.Schema, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the schema is only for buckets, I think we can set this to any number without doing harm. If we set it consistently to the same number (e.g. 0), we avoid the CopyToSchema step.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed both the zero threshold and schema fields.
Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>
Let me take a look at how we can do the detection transparently in an efficient manner. In the worst case we will just store the last decoded histogram as another field and use it for detection when the hint is unknown. |
We currently don't have a function to detect resets for integer histograms. Since PromQL always works with float histograms, I might need to add the same functionality just operating on integer buckets and spans. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just some comment nits.
promql/engine.go
Outdated
// detectHistogramStatsDecoding modifies the expression by setting the | ||
// SkipHistogramBuckets field in those vector selectors for which it is safe to | ||
// return only histogram statistics (sum and count), excluding histogram spans | ||
// and buckets. The function can be treated as an optimization and does is not |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"does" appears spurious. Maybe "thus"? Or just delete it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah this was spurious.
promql/parser/ast.go
Outdated
LabelMatchers []*labels.Matcher | ||
Offset time.Duration | ||
Timestamp *int64 | ||
SkipHistogramBuckets bool // Set when do not need to decode native histogram buckets for evaluating a query. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe a bit easier to read:
SkipHistogramBuckets bool // Set when do not need to decode native histogram buckets for evaluating a query. | |
SkipHistogramBuckets bool // Set when decoding native histogram buckets isn't needed for query evaluation. |
Any ideas so far? It would actually be quite cool if we could detect counter resets in the iterator already. Then we could remove it entirely in the PromQL layer (for native histograms, at least). If we can somehow safely assess if a chunk got deleted before the current chunk, we could avoid the manual counter reset detection even more. A heuristic would be if there is a gap larger than the typical scrape interval, but that's not strictly correct. I have a vague memory that we never delete chunks from the middle of a block, so that might narrow down the problem. Just brainstorming here… I'm almost ready to believe that the above mentioned edge case of an edge case of an edge case is rare enough that we can ignore it (but document it, just in case it becomes relevant after all). |
Another thought that crossed my mind: What about overlapping blocks and (soon) out-of-order handling? |
I've been busy moving apartments this week and did not have time to work in this further. I'll have my time back next week and will continue looking into the reset. My opinion is that we should handle the reset in the iterator by keeping track of the last decoded histogram. This way the code can also remain future proof in case we introduce more edge cases down the line.
I remember looking into this code as well, and we seem to set |
Good luck with the move. Looking forward to further explorations into this issue. |
Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>
Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>
@beorn7 I've addressed all comments and we now detect the reset in the iterator itself. I also moved iterator to the promql package since this is the only place where it is used. It also relies on public methods so we don't need access to encoding internals from the |
Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>
Thank you very much. This is now very close to the top of my review queue, but I won't get to it this week. Sorry for that. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very neat, thank you very much. I believe (and hope) this works as we expect. :o)
One additional idea for testing: It should be quite easy to write tests within the PromqL testing framework (in https://github.com/prometheus/prometheus/blob/main/promql/promqltest/testdata/native_histograms.test ) that use PromQL functions that use the new stats iterator under the hood. We could even construct counter resets there that are only detectable in buckets, not in the count (but I think it will be too hard to provoke a new chunk being cut at the same time so I don't think we'll exercise the new counter reset code path, because the testing framework probably ingests normally and will store the counter reset in the chunk header already).
promql/histogram_stats_iterator.go
Outdated
type histogramStatsIterator struct { | ||
chunkenc.Iterator | ||
|
||
hReader *histogram.Histogram |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Reader" somehow feels wrong as a name (in both hReader
and fhReader
). Maybe currentH
and currentFH
? Or just use h
and fh
(not saying anything specific will avoid assumptions…)?
How does the benchmark look with the added storing of the last seen histogram? |
Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>
Thanks for the review, I addressed all comments. Benchmarks still look good on the latest commit:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you very much, let's ship it!
I hope it's OK if I squash the commits into one… |
This PR speeds up
histogram_count
andhistogram_sum
functions on native histograms. The idea is to use a separate decoder which can be used by the engine to skip histogram buckets when decoding histogram objects. This should help with reducing allocations and make the sum aggregation faster it will only be done on primitive values excluding buckets and spans.Current benchmark results