Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement histogram statistics decoder #14097

Merged
merged 8 commits into from
Jun 6, 2024

Conversation

fpetkovski
Copy link
Contributor

@fpetkovski fpetkovski commented May 14, 2024

This PR speeds up histogram_count and histogram_sum functions on native histograms. The idea is to use a separate decoder which can be used by the engine to skip histogram buckets when decoding histogram objects. This should help with reducing allocations and make the sum aggregation faster it will only be done on primitive values excluding buckets and spans.

Current benchmark results

prometheus $ benchstat main.out new.out                                               
name                                                          old time/op    new time/op    delta
NativeHistograms/sum-11                                          404ms ± 6%     391ms ± 1%     ~     (p=0.222 n=5+5)
NativeHistograms/sum_rate_with_short_rate_interval-11            500ms ± 8%     492ms ± 8%     ~     (p=0.548 n=5+5)
NativeHistograms/sum_rate_with_long_rate_interval-11             803ms ± 1%     815ms ± 3%     ~     (p=0.095 n=5+5)
NativeHistograms/histogram_count_with_short_rate_interval-11     477ms ± 4%     318ms ± 5%  -33.29%  (p=0.008 n=5+5)
NativeHistograms/histogram_count_with_long_rate_interval-11      818ms ± 4%     582ms ± 1%  -28.77%  (p=0.008 n=5+5)

name                                                          old alloc/op   new alloc/op   delta
NativeHistograms/sum-11                                          413MB ± 0%     414MB ± 0%     ~     (p=0.841 n=5+5)
NativeHistograms/sum_rate_with_short_rate_interval-11            213MB ± 0%     213MB ± 0%     ~     (p=0.095 n=5+5)
NativeHistograms/sum_rate_with_long_rate_interval-11             314MB ± 0%     314MB ± 0%     ~     (p=0.841 n=5+5)
NativeHistograms/histogram_count_with_short_rate_interval-11     213MB ± 0%     143MB ± 0%  -32.94%  (p=0.016 n=5+4)
NativeHistograms/histogram_count_with_long_rate_interval-11      321MB ± 2%     202MB ± 6%  -37.15%  (p=0.008 n=5+5)

name                                                          old allocs/op  new allocs/op  delta
NativeHistograms/sum-11                                          5.16M ± 0%     5.16M ± 0%     ~     (p=1.000 n=5+5)
NativeHistograms/sum_rate_with_short_rate_interval-11            4.61M ± 0%     4.61M ± 0%     ~     (p=0.381 n=5+5)
NativeHistograms/sum_rate_with_long_rate_interval-11             6.12M ± 0%     6.12M ± 0%     ~     (p=1.000 n=5+5)
NativeHistograms/histogram_count_with_short_rate_interval-11     4.61M ± 0%     1.66M ± 0%  -63.99%  (p=0.008 n=5+5)
NativeHistograms/histogram_count_with_long_rate_interval-11      6.14M ± 0%     2.01M ± 3%  -67.36%  (p=0.008 n=5+5)

@fpetkovski fpetkovski force-pushed the histogram-stats-decoder branch from 8ad58a7 to 4a461a5 Compare May 14, 2024 07:25
This commit is a POC of how we can speed up histogram_count and histogram_sum
functions on native histograms. The idea is to have separate decoders which can be
used by the engine to only read count/sum values from histogram objects. This should help
with reducing allocations when decoding histograms, as well as with speeding up aggregations
like sum since they will be done on floats and not on histogram objects.

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>
@fpetkovski fpetkovski force-pushed the histogram-stats-decoder branch from 4a461a5 to d8061d5 Compare May 14, 2024 07:25
@fpetkovski
Copy link
Contributor Author

I've been tinkering with this approach a bit and I wonder if it could interfere with various mixed types checks, as well as with histogram counter resets. I will investigate another solution which returns a partially filled histogram object so that we can still go through the same histogramRate path.

@fpetkovski fpetkovski marked this pull request as draft May 15, 2024 05:38
@fpetkovski fpetkovski force-pushed the histogram-stats-decoder branch 3 times, most recently from 8df13fa to 0a72cc5 Compare May 15, 2024 07:43
Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>
@fpetkovski fpetkovski force-pushed the histogram-stats-decoder branch from 0a72cc5 to e6f9cd1 Compare May 15, 2024 07:44
@fpetkovski
Copy link
Contributor Author

Looking at profiles, I see CopyToSchema taking a lot of time. Do we need to decode/fill the schema field if we're only operating on Count/Sum values @krajorama ?
image

@fpetkovski fpetkovski marked this pull request as ready for review May 15, 2024 07:58
@fpetkovski fpetkovski requested a review from roidelapluie as a code owner May 15, 2024 07:58
@beorn7 beorn7 requested review from beorn7 and removed request for aknuds1, roidelapluie and jesusvazquez May 16, 2024 13:26
@beorn7
Copy link
Member

beorn7 commented May 16, 2024

Generally, this looks very nice.

However, there is a nasty edge case: In rare cases, a counter reset could not be seen in the count but only in individual buckets. We do detect those cases on ingestion and set the counter reset hint accordingly, so generally, this should make it through the special decoder here via the counter reset hint. However (edge case of an edge case of an edge case…), the following scenario could happen:

  • A chunk gets cut just because the previous chunk was full (and not because of a counter reset). In that case, we mark the counter reset hint as "unknown" because we cannot know if the previous chunk gets deleted. The idea is that we do manual counter reset detection in that case, just to play safe.
  • If the previous chunk gets really deleted, and if there is really a counter reset between the pre-previous chunk and the current chunk, and if that counter reset isn't detectable via the count alone, but only via (some of) the buckets, then this PR creates a problem (a different result than the one we get with the full decoding).

Maybe we can solve that cleanly by doing the counter reset detection directly in the iterator (and set the counter reset accordingly) in cases where the "unknown" hint is coming from TSDB.

Or we can just wing it and claim that this case is so outrageously unlikely that we don't need to handle it.

ZeroCount: f.hReader.ZeroCount,
Sum: f.hReader.Sum,
ZeroThreshold: f.hReader.ZeroThreshold,
Schema: f.hReader.Schema,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the schema is only for buckets, I think we can set this to any number without doing harm. If we set it consistently to the same number (e.g. 0), we avoid the CopyToSchema step.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed both the zero threshold and schema fields.

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>
Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>
@fpetkovski
Copy link
Contributor Author

Let me take a look at how we can do the detection transparently in an efficient manner. In the worst case we will just store the last decoded histogram as another field and use it for detection when the hint is unknown.

@fpetkovski
Copy link
Contributor Author

We currently don't have a function to detect resets for integer histograms. Since PromQL always works with float histograms, I might need to add the same functionality just operating on integer buckets and spans.

Copy link
Member

@beorn7 beorn7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some comment nits.

promql/engine.go Outdated
// detectHistogramStatsDecoding modifies the expression by setting the
// SkipHistogramBuckets field in those vector selectors for which it is safe to
// return only histogram statistics (sum and count), excluding histogram spans
// and buckets. The function can be treated as an optimization and does is not
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"does" appears spurious. Maybe "thus"? Or just delete it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this was spurious.

LabelMatchers []*labels.Matcher
Offset time.Duration
Timestamp *int64
SkipHistogramBuckets bool // Set when do not need to decode native histogram buckets for evaluating a query.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a bit easier to read:

Suggested change
SkipHistogramBuckets bool // Set when do not need to decode native histogram buckets for evaluating a query.
SkipHistogramBuckets bool // Set when decoding native histogram buckets isn't needed for query evaluation.

@beorn7
Copy link
Member

beorn7 commented May 22, 2024

Let me take a look at how we can do the detection transparently in an efficient manner. In the worst case we will just store the last decoded histogram as another field and use it for detection when the hint is unknown.

Any ideas so far?

It would actually be quite cool if we could detect counter resets in the iterator already. Then we could remove it entirely in the PromQL layer (for native histograms, at least).

If we can somehow safely assess if a chunk got deleted before the current chunk, we could avoid the manual counter reset detection even more. A heuristic would be if there is a gap larger than the typical scrape interval, but that's not strictly correct. I have a vague memory that we never delete chunks from the middle of a block, so that might narrow down the problem.

Just brainstorming here…

I'm almost ready to believe that the above mentioned edge case of an edge case of an edge case is rare enough that we can ignore it (but document it, just in case it becomes relevant after all).

@beorn7
Copy link
Member

beorn7 commented May 22, 2024

Another thought that crossed my mind: What about overlapping blocks and (soon) out-of-order handling?
This might create a lot more cases where we set the counter reset hint to "unknown". Maybe we should simply skip the optimization in those cases. (Again, I have to apologize that I just type out thoughts crossing my mind, based on vague memories from the past, without having done a proper research into the matter…)

@fpetkovski
Copy link
Contributor Author

fpetkovski commented May 22, 2024

I've been busy moving apartments this week and did not have time to work in this further. I'll have my time back next week and will continue looking into the reset.

My opinion is that we should handle the reset in the iterator by keeping track of the last decoded histogram. This way the code can also remain future proof in case we introduce more edge cases down the line.

What about overlapping blocks and (soon) out-of-order handling?

I remember looking into this code as well, and we seem to set unknown in the chain iterator whenever we change chunk. So I think the logic would break there unless we do explicit detection.

@beorn7
Copy link
Member

beorn7 commented May 23, 2024

Good luck with the move. Looking forward to further explorations into this issue.

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>
Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>
@fpetkovski
Copy link
Contributor Author

@beorn7 I've addressed all comments and we now detect the reset in the iterator itself. I also moved iterator to the promql package since this is the only place where it is used. It also relies on public methods so we don't need access to encoding internals from the chunkenc package.

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>
@beorn7
Copy link
Member

beorn7 commented May 30, 2024

Thank you very much. This is now very close to the top of my review queue, but I won't get to it this week. Sorry for that.

Copy link
Member

@beorn7 beorn7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very neat, thank you very much. I believe (and hope) this works as we expect. :o)

One additional idea for testing: It should be quite easy to write tests within the PromqL testing framework (in https://github.com/prometheus/prometheus/blob/main/promql/promqltest/testdata/native_histograms.test ) that use PromQL functions that use the new stats iterator under the hood. We could even construct counter resets there that are only detectable in buckets, not in the count (but I think it will be too hard to provoke a new chunk being cut at the same time so I don't think we'll exercise the new counter reset code path, because the testing framework probably ingests normally and will store the counter reset in the chunk header already).

promql/histogram_stats_iterator.go Show resolved Hide resolved
type histogramStatsIterator struct {
chunkenc.Iterator

hReader *histogram.Histogram
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Reader" somehow feels wrong as a name (in both hReader and fhReader). Maybe currentH and currentFH? Or just use h and fh (not saying anything specific will avoid assumptions…)?

promql/histogram_stats_iterator.go Show resolved Hide resolved
@beorn7
Copy link
Member

beorn7 commented Jun 5, 2024

How does the benchmark look with the added storing of the last seen histogram?

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>
@fpetkovski
Copy link
Contributor Author

Thanks for the review, I addressed all comments. Benchmarks still look good on the latest commit:

name                                                          old time/op    new time/op    delta
NativeHistograms/sum-11                                          396ms ± 2%     400ms ± 1%     ~     (p=0.310 n=5+5)
NativeHistograms/sum_rate_with_short_rate_interval-11            480ms ± 2%     481ms ± 2%     ~     (p=0.841 n=5+5)
NativeHistograms/sum_rate_with_long_rate_interval-11             800ms ± 1%     809ms ± 1%     ~     (p=0.095 n=5+5)
NativeHistograms/histogram_count_with_short_rate_interval-11     473ms ± 0%     302ms ± 0%  -36.14%  (p=0.008 n=5+5)
NativeHistograms/histogram_count_with_long_rate_interval-11      813ms ± 5%     513ms ± 1%  -36.89%  (p=0.008 n=5+5)

name                                                          old alloc/op   new alloc/op   delta
NativeHistograms/sum-11                                          414MB ± 0%     413MB ± 0%     ~     (p=0.222 n=5+5)
NativeHistograms/sum_rate_with_short_rate_interval-11            213MB ± 0%     213MB ± 0%     ~     (p=0.222 n=5+5)
NativeHistograms/sum_rate_with_long_rate_interval-11             314MB ± 0%     314MB ± 0%     ~     (p=0.690 n=5+5)
NativeHistograms/histogram_count_with_short_rate_interval-11     213MB ± 0%     137MB ± 0%  -35.80%  (p=0.008 n=5+5)
NativeHistograms/histogram_count_with_long_rate_interval-11      330MB ± 0%     140MB ± 8%  -57.48%  (p=0.016 n=4+5)

name                                                          old allocs/op  new allocs/op  delta
NativeHistograms/sum-11                                          5.16M ± 0%     5.16M ± 0%     ~     (p=0.095 n=5+5)
NativeHistograms/sum_rate_with_short_rate_interval-11            4.61M ± 0%     4.61M ± 0%     ~     (p=0.690 n=5+5)
NativeHistograms/sum_rate_with_long_rate_interval-11             6.12M ± 0%     6.12M ± 0%     ~     (p=0.603 n=4+5)
NativeHistograms/histogram_count_with_short_rate_interval-11     4.61M ± 0%     1.61M ± 0%  -65.04%  (p=0.008 n=5+5)
NativeHistograms/histogram_count_with_long_rate_interval-11      6.16M ± 1%     1.52M ± 3%  -75.23%  (p=0.008 n=5+5)

Copy link
Member

@beorn7 beorn7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much, let's ship it!

@beorn7
Copy link
Member

beorn7 commented Jun 6, 2024

I hope it's OK if I squash the commits into one…

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants