-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
new interning format based on []string indices #4
new interning format based on []string indices #4
Conversation
Co-authored-by: bwplotka <bwplotka@gmail.com> Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com>
Let's agree on one single format to move forward, and then I'll do some cleanup and adapt the tests. My take: I would go with Stats are almost entirely on its favor. The only downside is the manual marshalling optimization (which is the same we had to do with what we called "
Also, the |
I would go with I agree that modified custom marshalling/unmarshalling isn't a big concern at the moment, we can reevaluate when we move off of gogoproto which is another can of worms. So imo the decision is between maintaining the custom proto code for an extra 3-4% network bytes reduction (56% total), or not doing so for slightly less network bytes reduction (~52% total). |
Epic, thanks! I would vote for v11-min-str. Efficiency is similar generally to len and even if slightly worse, the proto and adoption is simpler and easier. |
also adapt tests to the new format Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com>
prompb/remote.proto
Outdated
// Cortex uses this field to determine the source of the write request. | ||
// We reserve it to avoid any compatibility issues. | ||
message MinimizedWriteRequestStr { | ||
repeated MinimizedTimeSeriesStr timeseries = 1 [(gogoproto.nullable) = false]; | ||
reserved 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need to reserve a field for the old metadata field 3
, and since this is most likely a 2.0 we could maybe move reserved 2
to reserved 99
or something like that. We can discuss in the meta issue.
@@ -776,73 +774,29 @@ func labelsToLabelsProto(lbls labels.Labels, buf []prompb.Label) []prompb.Label | |||
return result | |||
} | |||
|
|||
func labelsToUint32Slice(lbls labels.Labels, symbolTable *rwSymbolTable, buf []uint32) []uint32 { | |||
func labelsToUint32SliceStr(lbls labels.Labels, symbolTable *rwSymbolTable, buf []uint32) []uint32 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doesn't have to happen now but we should probably add short comments about what's happening here and in Uint32StrRefToLabels
for the final feature branch PR into upstream main
storage/remote/queue_manager.go
Outdated
type rwSymbolTable struct { | ||
symbols []byte | ||
strings []string | ||
symbolsMap map[string]offLenPair | ||
symbolsMapBytes map[string]uint32 | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can remove some of what's here in rwSymbolTable
now that we're not going with the slice of bytes approach right?
Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com>
Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com>
Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com>
8b4d170
into
alexnico-remote-write-1-1
commit 771a827 Author: Nicolás Pazos <npazosmendez@gmail.com> Date: Fri Dec 15 12:37:30 2023 -0300 lint commit 3dff288 Author: Nicolás Pazos <npazosmendez@gmail.com> Date: Wed Nov 29 13:17:36 2023 -0300 fix custom marshaling Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com> commit 8b4d170 Merge: f2bc161 ded5501 Author: Nicolás Pazos <32206519+npazosmendez@users.noreply.github.com> Date: Wed Nov 29 13:12:32 2023 -0300 Merge pull request #4 from npazosmendez/njpm/string-array-interning commit ded5501 Author: Nicolás Pazos <npazosmendez@gmail.com> Date: Wed Nov 29 13:09:18 2023 -0300 don't reserve field 3 for new proto and add TODO Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com> commit 7a66744 Author: Nicolás Pazos <npazosmendez@gmail.com> Date: Wed Nov 29 13:08:17 2023 -0300 add some TODOs for later Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com> commit 3e80fd9 Author: Nicolás Pazos <npazosmendez@gmail.com> Date: Wed Nov 29 13:07:58 2023 -0300 cleanup rwSymbolTable Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com> commit a37bdfd Author: Nicolás Pazos <npazosmendez@gmail.com> Date: Wed Nov 29 12:41:16 2023 -0300 remove all new rw formats but the []string one also adapt tests to the new format Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com> commit f40ce48 Author: Nicolás Pazos <npazosmendez@gmail.com> Date: Tue Nov 28 11:54:32 2023 -0300 new interning format based on []string indeces Co-authored-by: bwplotka <bwplotka@gmail.com> Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com> commit f2bc161 Merge: 4bdb865 788dfc8 Author: Nicolás Pazos <32206519+npazosmendez@users.noreply.github.com> Date: Thu Nov 23 19:34:04 2023 -0300 Merge pull request #3 from prometheus/callum-rw-format-testing commit 788dfc8 Author: Callum Styan <callumstyan@gmail.com> Date: Thu Nov 23 11:55:56 2023 -0800 fix minor lint issue + use labels Range function since it looks like the tests fail to do `range labels.Labels` on CI Signed-off-by: Callum Styan <callumstyan@gmail.com> commit 63ea815 Author: Callum Styan <callumstyan@gmail.com> Date: Thu Nov 23 11:32:45 2023 -0800 fix test panic Signed-off-by: Callum Styan <callumstyan@gmail.com> commit e2acded Author: Callum Styan <callumstyan@gmail.com> Date: Thu Nov 23 11:31:10 2023 -0800 more cleanup, address review comments Signed-off-by: Callum Styan <callumstyan@gmail.com> commit e672dd7 Author: Callum Styan <callumstyan@gmail.com> Date: Thu Nov 23 11:14:18 2023 -0800 remove package-lock.json change again Signed-off-by: Callum Styan <callumstyan@gmail.com> commit a3c6904 Author: Callum Styan <callumstyan@gmail.com> Date: Mon Nov 20 17:28:05 2023 -0800 more cleanup, mostly linting fixes Signed-off-by: Callum Styan <callumstyan@gmail.com> commit a0cd793 Author: Callum Styan <callumstyan@gmail.com> Date: Mon Nov 20 16:51:53 2023 -0800 cleanup; remove some unused functions Signed-off-by: Callum Styan <callumstyan@gmail.com> commit c377de1 Author: Callum Styan <callumstyan@gmail.com> Date: Mon Nov 20 16:43:44 2023 -0800 use require instead of assert in custom marshal test Signed-off-by: Callum Styan <callumstyan@gmail.com> commit c2c45d1 Author: Callum Styan <callumstyan@gmail.com> Date: Mon Nov 20 16:37:03 2023 -0800 More cleanup Signed-off-by: Callum Styan <callumstyan@gmail.com> commit fb3ed04 Author: Callum Styan <callumstyan@gmail.com> Date: Mon Nov 20 16:19:42 2023 -0800 remove more format types we probably won't use Signed-off-by: Callum Styan <callumstyan@gmail.com> commit bf773af Author: Callum Styan <callumstyan@gmail.com> Date: Sun Nov 19 19:05:30 2023 -0800 remove formats we've decided not to use Signed-off-by: Callum Styan <callumstyan@gmail.com> commit eec9663 Author: Callum Styan <callumstyan@gmail.com> Date: Wed Nov 15 10:12:00 2023 -0800 remove mistaken package lock changes Signed-off-by: Callum Styan <callumstyan@gmail.com> commit e444439 Author: Callum Styan <callumstyan@gmail.com> Date: Wed Nov 15 10:04:38 2023 -0800 test additional len and lenbytes formats Co-authored-by: Nicolás Pazos <npazosmendez@gmail.com> Signed-off-by: Callum Styan <callumstyan@gmail.com> commit 1842c99 Author: Callum Styan <callumstyan@gmail.com> Date: Mon Nov 13 21:53:44 2023 -0800 Add bytes slice (instead of slice of 32bit vars) format for testing Co-authored-by: Nicolás Pazos <npazosmendez@gmail.com> Signed-off-by: Callum Styan <callumstyan@gmail.com> commit 81f2076 Author: Callum Styan <callumstyan@gmail.com> Date: Mon Nov 13 16:12:19 2023 -0800 fix label ranging Signed-off-by: Callum Styan <callumstyan@gmail.com> commit e1e6406 Author: Callum Styan <callumstyan@gmail.com> Date: Mon Nov 13 15:53:43 2023 -0800 use exp slices for backwards compat. to go 1.20 plus add copyright header to test file Signed-off-by: Callum Styan <callumstyan@gmail.com> commit ed0c3b2 Author: Callum Styan <callumstyan@gmail.com> Date: Mon Nov 13 11:13:49 2023 -0800 refactor new version flag to make it easier to pick a specific format instead of having multiple flags, plus add new formats for testing Signed-off-by: Callum Styan <callumstyan@gmail.com> commit 4bdb865 Author: Nicolás Pazos <npazosmendez@gmail.com> Date: Thu Nov 9 16:57:16 2023 -0300 minimally-tested exemplar support for rw 1.1 Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com> commit 34f3f11 Author: Nicolás Pazos <npazosmendez@gmail.com> Date: Thu Nov 9 16:44:18 2023 -0300 remove all code from previous interning approach the 'minimized' version is now the only v1.1 version Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com> commit f259deb Author: Nicolás Pazos <npazosmendez@gmail.com> Date: Thu Nov 9 15:53:15 2023 -0300 fix writeRequestMinimizedFixture Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com> commit e2a5ea5 Author: Nicolás Pazos <npazosmendez@gmail.com> Date: Tue Nov 7 11:26:50 2023 -0300 Use unsafe []byte->string cast to reuse buffer Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com> commit c9b6ddd Author: Nicolás Pazos <npazosmendez@gmail.com> Date: Thu Nov 9 13:00:16 2023 -0300 manually optimize varint marshaling Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com> commit 761efc8 Author: Nicolás Pazos <npazosmendez@gmail.com> Date: Thu Nov 9 15:03:18 2023 -0300 Use two uint32 to encode (offset,leng) Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com> commit 7a633a2 Author: Nicolás Pazos <npazosmendez@gmail.com> Date: Thu Nov 9 11:22:53 2023 -0300 fix build for stringlabels tag Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com> commit 2935eab Author: Callum Styan <callumstyan@gmail.com> Date: Wed Nov 8 15:38:42 2023 -0800 update tests Signed-off-by: Callum Styan <callumstyan@gmail.com> commit d5f705f Author: Callum Styan <callumstyan@gmail.com> Date: Tue Nov 7 19:00:15 2023 -0800 remove unused proto type Signed-off-by: Callum Styan <callumstyan@gmail.com> commit 6621690 Author: Nicolás Pazos <npazosmendez@gmail.com> Date: Tue Nov 7 17:15:45 2023 -0300 Make LabelSymbols a fixed32 commit eb63f30 Author: Callum Styan <callumstyan@gmail.com> Date: Mon Nov 6 17:04:01 2023 -0800 fix minor things Signed-off-by: Callum Styan <callumstyan@gmail.com> commit 3e5facc Author: Callum Styan <callumstyan@gmail.com> Date: Mon Nov 6 16:49:49 2023 -0800 add functionality for new minimized remote write request format Signed-off-by: Callum Styan <callumstyan@gmail.com> commit 82c1df0 Author: Callum Styan <callumstyan@gmail.com> Date: Mon Nov 6 16:41:34 2023 -0800 add functions for translating between new proto formats symbol table and actual prometheus labels Co-authored-by: Marco Pracucci <marco@pracucci.com> Signed-off-by: Callum Styan <callumstyan@gmail.com> commit 7cea6fe Author: Callum Styan <callumstyan@gmail.com> Date: Mon Nov 6 15:38:11 2023 -0800 Add minmized remote write proto format Co-authored-by: Marco Pracucci <marco@pracucci.com> Signed-off-by: Callum Styan <callumstyan@gmail.com> commit 46b84ab Author: Nicolás Pazos <npazosmendez@gmail.com> Date: Tue Oct 31 09:52:55 2023 -0300 lint commit 4654241 Author: Nicolás Pazos <npazosmendez@gmail.com> Date: Mon Oct 23 17:12:22 2023 -0300 fix typo in log commit cef9891 Author: Nicolás Pazos <npazosmendez@gmail.com> Date: Mon Oct 23 12:39:06 2023 -0300 remote write handler to checks version header commit e3f27aa Author: Nicolás Pazos <npazosmendez@gmail.com> Date: Mon Oct 23 12:19:59 2023 -0300 fields rewording in handler commit 6f21272 Author: Nicolás Pazos <npazosmendez@gmail.com> Date: Mon Oct 23 12:13:33 2023 -0300 fix NewWriteClient and change new flags wording commit 1cddea1 Author: alexgreenbank <alex.greenbank@grafana.com> Date: Fri Oct 20 18:36:10 2023 +0000 gofmt commit 8ab14f2 Author: alexgreenbank <alex.greenbank@grafana.com> Date: Fri Oct 20 17:35:03 2023 +0000 Remove config, update proto commit 15c4d45 Author: alexgreenbank <alex.greenbank@grafana.com> Date: Thu Oct 12 17:34:51 2023 +0000 Add 1.1 version handling code commit 06b486b Author: Nicolás Pazos <npazosmendez@gmail.com> Date: Wed Oct 11 17:25:07 2023 -0300 cleanup: remove hardcoded fake url for testing commit 44f166d Author: Nicolás Pazos <npazosmendez@gmail.com> Date: Wed Oct 11 17:22:20 2023 -0300 Use github.com/golang/snappy commit c1593fd Author: Nicolás Pazos <npazosmendez@gmail.com> Date: Wed Oct 11 12:10:00 2023 -0300 Improve sender benchmarks and some allocations commit b35ab6c Author: Nicolás Pazos <npazosmendez@gmail.com> Date: Fri Oct 6 16:57:49 2023 -0300 fix build commit 2f815ee Author: Nicolás Pazos <npazosmendez@gmail.com> Date: Wed Oct 4 15:54:28 2023 -0300 refactor queue manager code to remove some duplication commit 3be59f0 Author: Nicolás Pazos <npazosmendez@gmail.com> Date: Tue Oct 3 14:58:16 2023 -0300 add sender-side tests and fix failing ones commit eebf7ac Author: Nicolás Pazos <npazosmendez@gmail.com> Date: Tue Oct 3 14:51:30 2023 -0300 fix: queue manager to include float histograms in new requests commit 17aa5b5 Author: Nicolás Pazos <npazosmendez@gmail.com> Date: Mon Oct 2 17:36:42 2023 -0300 refactor out common code between write methods commit 111bf0f Author: Nicolás Pazos <npazosmendez@gmail.com> Date: Mon Oct 2 17:02:48 2023 -0300 add basic reduce remote write handler benchmark commit 0dc96d6 Author: Nicolás Pazos <npazosmendez@gmail.com> Date: Mon Oct 2 13:29:31 2023 -0300 fix mocks and fixture commit ed34405 Author: Nicolás Pazos <npazosmendez@gmail.com> Date: Fri Sep 29 12:35:52 2023 -0300 remove some comented code commit fff56c0 Author: Nicolás Pazos <npazosmendez@gmail.com> Date: Wed Sep 27 18:13:11 2023 -0300 no-brainer copypaste but more performance write support commit 0062b91 Author: Nicolás Pazos <npazosmendez@gmail.com> Date: Tue Sep 26 13:50:44 2023 -0300 Fix test commit 337f9ae Author: Nicolás Pazos <npazosmendez@gmail.com> Date: Tue Sep 26 13:30:26 2023 -0300 add new proto support on receiver end commit f65eb1c Author: Nicolás Pazos <npazosmendez@gmail.com> Date: Tue Sep 26 13:29:59 2023 -0300 tests and new -> original proto mapping util commit 867cc97 Author: Callum Styan <callumstyan@gmail.com> Date: Mon Feb 20 14:11:12 2023 -0800 Add new test client Signed-off-by: Callum Styan <callumstyan@gmail.com> commit b25710b Author: Callum Styan <callumstyan@gmail.com> Date: Mon Feb 20 14:10:34 2023 -0800 update example server to include handler for new format Signed-off-by: Callum Styan <callumstyan@gmail.com> commit f199259 Author: Callum Styan <callumstyan@gmail.com> Date: Mon Feb 20 14:10:13 2023 -0800 Implement code paths for new proto format Signed-off-by: Callum Styan <callumstyan@gmail.com> commit 4910a33 Author: Callum Styan <callumstyan@gmail.com> Date: Mon Feb 20 13:12:15 2023 -0800 add lookup table struct that is used to build the symbol table in new write request format Signed-off-by: Callum Styan <callumstyan@gmail.com> commit 36a30c9 Author: Callum Styan <callumstyan@gmail.com> Date: Mon Feb 20 13:10:47 2023 -0800 add decode function for new write request proto Signed-off-by: Callum Styan <callumstyan@gmail.com> commit 01f1cf9 Author: Callum Styan <callumstyan@gmail.com> Date: Mon Feb 20 12:42:35 2023 -0800 add new proto types Signed-off-by: Callum Styan <callumstyan@gmail.com> commit 7a862d2 Author: Callum Styan <callumstyan@gmail.com> Date: Mon Feb 20 12:20:45 2023 -0800 replace snappy encoding library Signed-off-by: Callum Styan <callumstyan@gmail.com> commit 07e21cb Author: Nicolás Pazos <32206519+npazosmendez@users.noreply.github.com> Date: Wed Nov 8 16:21:25 2023 -0300 Remote Write 1.1: e2e benchmarks (prometheus#13102) * Remote Write e2e benchmarks Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com> * Prometheus ports automatically assigned Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com> * make dashboard editable + more modular to different job label values Signed-off-by: Callum Styan <callumstyan@gmail.com> * Dashboard improvements * memory stats * diffs look at counter increases Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com> * run script: absolute path for config templates Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com> * grafana dashboard improvements * show actual values of metrics * add memory stats and diff Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com> * dashboard changes Signed-off-by: Callum Styan <callumstyan@gmail.com> --------- Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com> Signed-off-by: Callum Styan <callumstyan@gmail.com> Co-authored-by: Callum Styan <callumstyan@gmail.com>
Based on @bwplotka's suggestion of trying
repeated string
as an interning table.I've implemented two versions:
v11-min-str-fixed
: references arefixed32
indices to the string arrayv11-min-str
: references areuint32
indices to the string array, and the varints encoding is optimized manually to avoid allocations.Benchmarks
Results are from a ~1hr run using the e2e bench scripts, scraping an entire real k8s namespaces with around 100 pods. Versions I'm running are:
v1
: the current v1 protocol, no interning.v11-min-len
: references are indices to a[]byte
, and that array is of the form<str_len><str_data><str_len><str_data_>...
. This already existed, and it's the same as the one in these benchmarks.v11-min-str-fixed
(see above)v11-min-str
(see above)These are the most relevant numbers: network usage and sender/receiver CPU. Memory is practically the same for all.
I think in a long run like this one, looking at the diff of the total CPU usage is more representative than the 5minute rate diff. The latter varies a lot, and also the former represents the cloud spend more accurately.
Entire screenshot