Implement histogram statistics decoder #14097

fpetkovski · 2024-05-14T07:24:05Z

This PR speeds up histogram_count and histogram_sum functions on native histograms. The idea is to use a separate decoder which can be used by the engine to skip histogram buckets when decoding histogram objects. This should help with reducing allocations and make the sum aggregation faster it will only be done on primitive values excluding buckets and spans.

Current benchmark results

prometheus $ benchstat main.out new.out                                               
name                                                          old time/op    new time/op    delta
NativeHistograms/sum-11                                          404ms ± 6%     391ms ± 1%     ~     (p=0.222 n=5+5)
NativeHistograms/sum_rate_with_short_rate_interval-11            500ms ± 8%     492ms ± 8%     ~     (p=0.548 n=5+5)
NativeHistograms/sum_rate_with_long_rate_interval-11             803ms ± 1%     815ms ± 3%     ~     (p=0.095 n=5+5)
NativeHistograms/histogram_count_with_short_rate_interval-11     477ms ± 4%     318ms ± 5%  -33.29%  (p=0.008 n=5+5)
NativeHistograms/histogram_count_with_long_rate_interval-11      818ms ± 4%     582ms ± 1%  -28.77%  (p=0.008 n=5+5)

name                                                          old alloc/op   new alloc/op   delta
NativeHistograms/sum-11                                          413MB ± 0%     414MB ± 0%     ~     (p=0.841 n=5+5)
NativeHistograms/sum_rate_with_short_rate_interval-11            213MB ± 0%     213MB ± 0%     ~     (p=0.095 n=5+5)
NativeHistograms/sum_rate_with_long_rate_interval-11             314MB ± 0%     314MB ± 0%     ~     (p=0.841 n=5+5)
NativeHistograms/histogram_count_with_short_rate_interval-11     213MB ± 0%     143MB ± 0%  -32.94%  (p=0.016 n=5+4)
NativeHistograms/histogram_count_with_long_rate_interval-11      321MB ± 2%     202MB ± 6%  -37.15%  (p=0.008 n=5+5)

name                                                          old allocs/op  new allocs/op  delta
NativeHistograms/sum-11                                          5.16M ± 0%     5.16M ± 0%     ~     (p=1.000 n=5+5)
NativeHistograms/sum_rate_with_short_rate_interval-11            4.61M ± 0%     4.61M ± 0%     ~     (p=0.381 n=5+5)
NativeHistograms/sum_rate_with_long_rate_interval-11             6.12M ± 0%     6.12M ± 0%     ~     (p=1.000 n=5+5)
NativeHistograms/histogram_count_with_short_rate_interval-11     4.61M ± 0%     1.66M ± 0%  -63.99%  (p=0.008 n=5+5)
NativeHistograms/histogram_count_with_long_rate_interval-11      6.14M ± 0%     2.01M ± 3%  -67.36%  (p=0.008 n=5+5)

This commit is a POC of how we can speed up histogram_count and histogram_sum functions on native histograms. The idea is to have separate decoders which can be used by the engine to only read count/sum values from histogram objects. This should help with reducing allocations when decoding histograms, as well as with speeding up aggregations like sum since they will be done on floats and not on histogram objects. Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>

fpetkovski · 2024-05-15T05:37:54Z

I've been tinkering with this approach a bit and I wonder if it could interfere with various mixed types checks, as well as with histogram counter resets. I will investigate another solution which returns a partially filled histogram object so that we can still go through the same histogramRate path.

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>

fpetkovski · 2024-05-15T07:58:40Z

Looking at profiles, I see CopyToSchema taking a lot of time. Do we need to decode/fill the schema field if we're only operating on Count/Sum values @krajorama ?

beorn7 · 2024-05-16T13:34:38Z

Generally, this looks very nice.

However, there is a nasty edge case: In rare cases, a counter reset could not be seen in the count but only in individual buckets. We do detect those cases on ingestion and set the counter reset hint accordingly, so generally, this should make it through the special decoder here via the counter reset hint. However (edge case of an edge case of an edge case…), the following scenario could happen:

A chunk gets cut just because the previous chunk was full (and not because of a counter reset). In that case, we mark the counter reset hint as "unknown" because we cannot know if the previous chunk gets deleted. The idea is that we do manual counter reset detection in that case, just to play safe.
If the previous chunk gets really deleted, and if there is really a counter reset between the pre-previous chunk and the current chunk, and if that counter reset isn't detectable via the count alone, but only via (some of) the buckets, then this PR creates a problem (a different result than the one we get with the full decoding).

Maybe we can solve that cleanly by doing the counter reset detection directly in the iterator (and set the counter reset accordingly) in cases where the "unknown" hint is coming from TSDB.

Or we can just wing it and claim that this case is so outrageously unlikely that we don't need to handle it.

beorn7 · 2024-05-16T13:37:33Z

tsdb/chunkenc/histogram.go

+			ZeroCount:        f.hReader.ZeroCount,
+			Sum:              f.hReader.Sum,
+			ZeroThreshold:    f.hReader.ZeroThreshold,
+			Schema:           f.hReader.Schema,


Since the schema is only for buckets, I think we can set this to any number without doing harm. If we set it consistently to the same number (e.g. 0), we avoid the CopyToSchema step.

I removed both the zero threshold and schema fields.

promql/engine.go

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>

fpetkovski · 2024-05-17T13:15:08Z

Let me take a look at how we can do the detection transparently in an efficient manner. In the worst case we will just store the last decoded histogram as another field and use it for detection when the hint is unknown.

fpetkovski · 2024-05-17T14:05:43Z

We currently don't have a function to detect resets for integer histograms. Since PromQL always works with float histograms, I might need to add the same functionality just operating on integer buckets and spans.

beorn7

Just some comment nits.

beorn7 · 2024-05-22T15:55:24Z

promql/engine.go

+// detectHistogramStatsDecoding modifies the expression by setting the
+// SkipHistogramBuckets field in those vector selectors for which it is safe to
+// return only histogram statistics (sum and count), excluding histogram spans
+// and buckets. The function can be treated as an optimization and does is not


"does" appears spurious. Maybe "thus"? Or just delete it?

Yeah this was spurious.

beorn7 · 2024-05-22T15:57:20Z

promql/parser/ast.go

-	LabelMatchers []*labels.Matcher
+	Offset               time.Duration
+	Timestamp            *int64
+	SkipHistogramBuckets bool     // Set when do not need to decode native histogram buckets for evaluating a query.


Maybe a bit easier to read:

Suggested change

SkipHistogramBuckets bool // Set when do not need to decode native histogram buckets for evaluating a query.

SkipHistogramBuckets bool // Set when decoding native histogram buckets isn't needed for query evaluation.

beorn7 · 2024-05-22T16:15:51Z

Let me take a look at how we can do the detection transparently in an efficient manner. In the worst case we will just store the last decoded histogram as another field and use it for detection when the hint is unknown.

Any ideas so far?

It would actually be quite cool if we could detect counter resets in the iterator already. Then we could remove it entirely in the PromQL layer (for native histograms, at least).

If we can somehow safely assess if a chunk got deleted before the current chunk, we could avoid the manual counter reset detection even more. A heuristic would be if there is a gap larger than the typical scrape interval, but that's not strictly correct. I have a vague memory that we never delete chunks from the middle of a block, so that might narrow down the problem.

Just brainstorming here…

I'm almost ready to believe that the above mentioned edge case of an edge case of an edge case is rare enough that we can ignore it (but document it, just in case it becomes relevant after all).

beorn7 · 2024-05-22T16:17:42Z

Another thought that crossed my mind: What about overlapping blocks and (soon) out-of-order handling?
This might create a lot more cases where we set the counter reset hint to "unknown". Maybe we should simply skip the optimization in those cases. (Again, I have to apologize that I just type out thoughts crossing my mind, based on vague memories from the past, without having done a proper research into the matter…)

fpetkovski · 2024-05-22T16:25:41Z

I've been busy moving apartments this week and did not have time to work in this further. I'll have my time back next week and will continue looking into the reset.

My opinion is that we should handle the reset in the iterator by keeping track of the last decoded histogram. This way the code can also remain future proof in case we introduce more edge cases down the line.

What about overlapping blocks and (soon) out-of-order handling?

I remember looking into this code as well, and we seem to set unknown in the chain iterator whenever we change chunk. So I think the logic would break there unless we do explicit detection.

beorn7 · 2024-05-23T12:29:33Z

Good luck with the move. Looking forward to further explorations into this issue.

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>

fpetkovski · 2024-05-27T12:47:51Z

@beorn7 I've addressed all comments and we now detect the reset in the iterator itself. I also moved iterator to the promql package since this is the only place where it is used. It also relies on public methods so we don't need access to encoding internals from the chunkenc package.

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>

beorn7 · 2024-05-30T18:39:50Z

Thank you very much. This is now very close to the top of my review queue, but I won't get to it this week. Sorry for that.

beorn7

Very neat, thank you very much. I believe (and hope) this works as we expect. :o)

One additional idea for testing: It should be quite easy to write tests within the PromqL testing framework (in https://github.com/prometheus/prometheus/blob/main/promql/promqltest/testdata/native_histograms.test ) that use PromQL functions that use the new stats iterator under the hood. We could even construct counter resets there that are only detectable in buckets, not in the count (but I think it will be too hard to provoke a new chunk being cut at the same time so I don't think we'll exercise the new counter reset code path, because the testing framework probably ingests normally and will store the counter reset in the chunk header already).

promql/histogram_stats_iterator.go

beorn7 · 2024-06-05T11:20:48Z

promql/histogram_stats_iterator.go

+type histogramStatsIterator struct {
+	chunkenc.Iterator
+
+	hReader *histogram.Histogram


"Reader" somehow feels wrong as a name (in both hReader and fhReader). Maybe currentH and currentFH? Or just use h and fh (not saying anything specific will avoid assumptions…)?

promql/histogram_stats_iterator.go

beorn7 · 2024-06-05T11:40:32Z

How does the benchmark look with the added storing of the last seen histogram?

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>

fpetkovski · 2024-06-06T12:20:03Z

Thanks for the review, I addressed all comments. Benchmarks still look good on the latest commit:

name                                                          old time/op    new time/op    delta
NativeHistograms/sum-11                                          396ms ± 2%     400ms ± 1%     ~     (p=0.310 n=5+5)
NativeHistograms/sum_rate_with_short_rate_interval-11            480ms ± 2%     481ms ± 2%     ~     (p=0.841 n=5+5)
NativeHistograms/sum_rate_with_long_rate_interval-11             800ms ± 1%     809ms ± 1%     ~     (p=0.095 n=5+5)
NativeHistograms/histogram_count_with_short_rate_interval-11     473ms ± 0%     302ms ± 0%  -36.14%  (p=0.008 n=5+5)
NativeHistograms/histogram_count_with_long_rate_interval-11      813ms ± 5%     513ms ± 1%  -36.89%  (p=0.008 n=5+5)

name                                                          old alloc/op   new alloc/op   delta
NativeHistograms/sum-11                                          414MB ± 0%     413MB ± 0%     ~     (p=0.222 n=5+5)
NativeHistograms/sum_rate_with_short_rate_interval-11            213MB ± 0%     213MB ± 0%     ~     (p=0.222 n=5+5)
NativeHistograms/sum_rate_with_long_rate_interval-11             314MB ± 0%     314MB ± 0%     ~     (p=0.690 n=5+5)
NativeHistograms/histogram_count_with_short_rate_interval-11     213MB ± 0%     137MB ± 0%  -35.80%  (p=0.008 n=5+5)
NativeHistograms/histogram_count_with_long_rate_interval-11      330MB ± 0%     140MB ± 8%  -57.48%  (p=0.016 n=4+5)

name                                                          old allocs/op  new allocs/op  delta
NativeHistograms/sum-11                                          5.16M ± 0%     5.16M ± 0%     ~     (p=0.095 n=5+5)
NativeHistograms/sum_rate_with_short_rate_interval-11            4.61M ± 0%     4.61M ± 0%     ~     (p=0.690 n=5+5)
NativeHistograms/sum_rate_with_long_rate_interval-11             6.12M ± 0%     6.12M ± 0%     ~     (p=0.603 n=4+5)
NativeHistograms/histogram_count_with_short_rate_interval-11     4.61M ± 0%     1.61M ± 0%  -65.04%  (p=0.008 n=5+5)
NativeHistograms/histogram_count_with_long_rate_interval-11      6.16M ± 1%     1.52M ± 3%  -75.23%  (p=0.008 n=5+5)

beorn7

Thank you very much, let's ship it!

beorn7 · 2024-06-06T15:16:10Z

I hope it's OK if I squash the commits into one…

fpetkovski requested review from jesusvazquez and aknuds1 as code owners May 14, 2024 07:24

fpetkovski force-pushed the histogram-stats-decoder branch from 8ad58a7 to 4a461a5 Compare May 14, 2024 07:25

fpetkovski force-pushed the histogram-stats-decoder branch from 4a461a5 to d8061d5 Compare May 14, 2024 07:25

fpetkovski marked this pull request as draft May 15, 2024 05:38

fpetkovski force-pushed the histogram-stats-decoder branch 3 times, most recently from 8df13fa to 0a72cc5 Compare May 15, 2024 07:43

Return partial histograms

e6f9cd1

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>

fpetkovski force-pushed the histogram-stats-decoder branch from 0a72cc5 to e6f9cd1 Compare May 15, 2024 07:44

fpetkovski marked this pull request as ready for review May 15, 2024 07:58

fpetkovski requested a review from roidelapluie as a code owner May 15, 2024 07:58

beorn7 requested review from beorn7 and removed request for aknuds1, roidelapluie and jesusvazquez May 16, 2024 13:26

beorn7 reviewed May 16, 2024

View reviewed changes

promql/engine.go Show resolved Hide resolved

MichaHoffmann reviewed May 17, 2024

View reviewed changes

promql/engine.go Show resolved Hide resolved

fpetkovski added 2 commits May 17, 2024 15:00

Exclude unnecessary fields

743ce40

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>

Fix lint

c7111d0

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>

beorn7 requested changes May 22, 2024

View reviewed changes

fpetkovski added 2 commits May 26, 2024 10:19

Detect histogram reset

b162350

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>

Address comments and detect reset

f6ec456

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>

Add license and fix lint

fb3818f

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>

beorn7 requested changes Jun 5, 2024

View reviewed changes

Address code review comments

aed3550

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>

beorn7 approved these changes Jun 6, 2024

View reviewed changes

beorn7 merged commit 6e68046 into prometheus:main Jun 6, 2024
25 checks passed

beorn7 mentioned this pull request Jun 6, 2024

promql: Add tests for histogram counter reset only in bucket #14269

Merged

pracucci mentioned this pull request Jun 7, 2024

Resync upstream prometheus grafana/mimir-prometheus#641

Merged

charleskorn mentioned this pull request Jan 6, 2025

MQE: track number of processed samples in each query grafana/mimir#10232

Merged

1 task

linasm mentioned this pull request Jan 21, 2025

promql: use histogram stats decoder for histogram_avg #15849

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement histogram statistics decoder #14097

Implement histogram statistics decoder #14097

fpetkovski commented May 14, 2024 •

edited

Loading

fpetkovski commented May 15, 2024

fpetkovski commented May 15, 2024

beorn7 commented May 16, 2024 •

edited

Loading

beorn7 May 16, 2024

fpetkovski May 17, 2024

fpetkovski commented May 17, 2024

fpetkovski commented May 17, 2024

beorn7 left a comment

beorn7 May 22, 2024

fpetkovski May 27, 2024

beorn7 May 22, 2024

beorn7 commented May 22, 2024

beorn7 commented May 22, 2024

fpetkovski commented May 22, 2024 •

edited

Loading

beorn7 commented May 23, 2024

fpetkovski commented May 27, 2024

beorn7 commented May 30, 2024

beorn7 left a comment

beorn7 Jun 5, 2024

beorn7 commented Jun 5, 2024

fpetkovski commented Jun 6, 2024

beorn7 left a comment

beorn7 commented Jun 6, 2024

	SkipHistogramBuckets bool // Set when do not need to decode native histogram buckets for evaluating a query.
	SkipHistogramBuckets bool // Set when decoding native histogram buckets isn't needed for query evaluation.

Implement histogram statistics decoder #14097

Implement histogram statistics decoder #14097

Conversation

fpetkovski commented May 14, 2024 • edited Loading

fpetkovski commented May 15, 2024

fpetkovski commented May 15, 2024

beorn7 commented May 16, 2024 • edited Loading

beorn7 May 16, 2024

Choose a reason for hiding this comment

fpetkovski May 17, 2024

Choose a reason for hiding this comment

fpetkovski commented May 17, 2024

fpetkovski commented May 17, 2024

beorn7 left a comment

Choose a reason for hiding this comment

beorn7 May 22, 2024

Choose a reason for hiding this comment

fpetkovski May 27, 2024

Choose a reason for hiding this comment

beorn7 May 22, 2024

Choose a reason for hiding this comment

beorn7 commented May 22, 2024

beorn7 commented May 22, 2024

fpetkovski commented May 22, 2024 • edited Loading

beorn7 commented May 23, 2024

fpetkovski commented May 27, 2024

beorn7 commented May 30, 2024

beorn7 left a comment

Choose a reason for hiding this comment

beorn7 Jun 5, 2024

Choose a reason for hiding this comment

beorn7 commented Jun 5, 2024

fpetkovski commented Jun 6, 2024

beorn7 left a comment

Choose a reason for hiding this comment

beorn7 commented Jun 6, 2024

fpetkovski commented May 14, 2024 •

edited

Loading

beorn7 commented May 16, 2024 •

edited

Loading

fpetkovski commented May 22, 2024 •

edited

Loading