-
Notifications
You must be signed in to change notification settings - Fork 9.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Propagate metadata from the WAL to remote_write #11640
Propagate metadata from the WAL to remote_write #11640
Conversation
Exciting to see progress on types being natively available in regular Prometheus Write Requesrs from Prom. With respect to:
I see the current PR does the opposite of this, I think that may not be pragmatic for two major reasons:
Also just wanted to sign off with support and thanks. |
Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>
Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>
Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>
9b1b000
to
0093563
Compare
Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>
Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>
Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>
Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>
Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>
Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good progress 👍
@@ -1347,6 +1420,7 @@ func (s *shards) runShard(ctx context.Context, shardID int, queue *queue) { | |||
pBuf = proto.NewBuffer(nil) | |||
buf []byte | |||
) | |||
// TODO(@tpaschalis) Should we also raise the max if we have WAL metadata? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe, but more likely we should revisit the defaults and recommendations around the buffer size.
For the purposes of this PR we could examine the # of shards that would be used to send samples with the metadata as is, and how many shards the current main branch uses.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what should be done for the purposes of this PR, could you elaborate?
Right now, the metadata would only be sent a) when first seen and b) whenever it changes, which shouldn't be too often, right? Would you expect a difference in the number of shards needed to run?
tsdb/wlog/watcher.go
Outdated
@@ -51,6 +51,7 @@ type WriteTo interface { | |||
AppendExemplars([]record.RefExemplar) bool | |||
AppendHistograms([]record.RefHistogramSample) bool | |||
AppendFloatHistograms([]record.RefFloatHistogramSample) bool | |||
AppendWALMetadata([]record.RefMetadata) bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we just name this AppendMetadata
since we didn't have a way of passing metadata to the WriteTo interface previously?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current implementation of metadata uses the same name.
I can rename that to something like AppendWatcherMetadata
or AppendFromMetadataWatcher
, wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a different interface so it should be okay? but I think we should also ensure the scrape code doesn't get a metadata appender if the WAL metadata is being used, ie if both configs are set we should error on startup, or log a warning and only use the metadata
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ie if both configs are set we should error on startup, or log a warning and only use the metadata
I also thought to error on startup, but unfortunately the previous mechanism is enabled by default; erroring out directly could be confusing to user which would have to go and explicilty disable the old one.
ts=2023-04-28T08:29:04.757Z caller=main.go:1229 level=info msg="Loading configuration file" filename=/Users/tpaschalis/prometheus.yml
ts=2023-04-28T08:29:04.757Z caller=main.go:1255 level=error msg="Failed to apply configuration" err="the 'send_metadata' and 'metadata_config.send' parameters are mutually exclusive"
...
ts=2023-04-28T08:29:04.764Z caller=main.go:1164 level=error err="error loading config from \"/Users/tpaschalis/prometheus.yml\": one or more errors occurred while applying the new configuration (--config.file=\"/Users/tpaschalis/prometheus.yml\")"
I think we should go with your other approach; in 42fc288, if both are set, I'm logging a warning and defaulting to implcitly disabling the old MetadataWatcher instead. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As for the method name, there's another MetadataAppender
interface which clashes with this name.
I've changed that name to AppendMetadataWatcher and kept the simple AppendMetadata
for our own here. Let me know if you don't want this change and would prefer to roll it back.
case "metadata-storage": | ||
c.scrape.EnableMetadataStorage = true | ||
level.Info(logger).Log("msg", "Experimental in-memory metadata storage enabled") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should probably check that this is enabled when SendWALMetadata
is being used, and if not print an error/warning
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in 42fc288, after parsing the configuration file. Not sure if there's a better place for the log line that has access to both fields.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if the "SendWALMetadata" is not the ultimate feature flag for us then? Why bother second flag then if SendWALMetadata could control storage too (both has to be true always)?
storage/remote/queue_manager.go
Outdated
func (t *QueueManager) AppendWALMetadata(ms []record.RefMetadata) bool { | ||
if !t.sendMetadata { | ||
return true | ||
} | ||
|
||
outer: | ||
for _, m := range ms { | ||
t.seriesMtx.Lock() | ||
lbls, ok := t.seriesLabels[m.Ref] | ||
if !ok { | ||
t.metrics.droppedMetadataTotal.Inc() | ||
// Track dropped metadata in the same EWMA for sharding calc. | ||
t.dataDropped.incr(1) | ||
if _, ok := t.droppedSeries[m.Ref]; !ok { | ||
level.Info(t.logger).Log("msg", "Dropped metadata for series that was not explicitly dropped via relabelling", "ref", m.Ref) | ||
} | ||
t.seriesMtx.Unlock() | ||
continue | ||
} | ||
t.seriesMtx.Unlock() | ||
|
||
backoff := t.cfg.MinBackoff | ||
for { | ||
select { | ||
case <-t.quit: | ||
return false | ||
default: | ||
} | ||
if t.shards.enqueue(m.Ref, timeSeries{ | ||
seriesLabels: lbls, | ||
sType: tMetadata, | ||
metadata: &metadata.Metadata{ | ||
Type: record.ToTextparseMetricType(m.Type), | ||
Help: m.Help, | ||
Unit: m.Unit, | ||
}, | ||
}) { | ||
continue outer | ||
} | ||
|
||
t.metrics.enqueueRetriesTotal.Inc() | ||
time.Sleep(time.Duration(backoff)) | ||
backoff *= 2 | ||
if backoff > t.cfg.MaxBackoff { | ||
backoff = t.cfg.MaxBackoff | ||
} | ||
} | ||
} | ||
|
||
return true | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think actually this should change. If we have metadata records in the WAL and we're going to be sending metadata alongside series labels and samples in the same request, we should be treating metadata records the same as we do series records. That is, we should cache the metadata for a series, check for it whenever we receive a sample that we want to put into the next batch, and also clean up that cache when we clean up the series label cache.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So if I understand correctly, instead of enqueueing a new timeSeries struct every time a WAL metadata record is updated, we'd like to send metadata along with every sample we receive for a series?
Sending them continuously is closer to what we do now (collect and send everything every 1m by default), and the truth is that we'll never send a metadata record without a sample since they only appear in the scrape loop.
I'm just not sure whether that's wasteful or not.
I've went ahead to implement this on bd41c24 if you want to take a look. Is that what you had in mind?
a) Make the TimeSeries
proto definition contain a single metadata instance (remove the repeated
keyword)
b) I've removed the tMetadata seriesType
and the ability to enqueue metadata on its own
c) Use the StoreMetadata method to keep track of metadata similar to the seriesLabels cache and pass that along wherever we create a timeSeries
struct (in Append/AppendExemplars/AppendHistograms etc).
d) If sendMetadata
is true, we append this cache value to the outgoing timeSeries struct in populateTimeSeries
e) When the series is gone, we clear the cache entry along the series' labels
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After an offline sync, I'm reverting this approach (at least for now), to avoid making all requests bigger by default.
I'm keeping the change of each TimeSeries struct containing a single Metadata (not repeated).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can't just change the remote-write proto definition, it's a standard implemented by many parties.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My initial approach was piggy-backing off the current way metadata is sent over remote_write, but was pretty inefficient and didn't take great advantage of our previous work.
This approach adds a new field, which should be backwards compatible and also similar to how Histograms were also recently added in d9d51c5. Also, like histograms, this is also something opt-in. With these in mind, are there still concerns about changing the proto definition? If then, what would be your suggestion as the next steps, to start a wider discussion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Broadly, write a Proposal and get it accepted. See for example #10539
Metadata is an existing feature, unlike Native Histograms, so the impact of changing the way they work is much greater.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I had understood how the change was being proposed was that current "metadata sync" flow would stay the same but that inline metadata would be added (so that it can be identified the exact type of a metric at push time for metric pipeline processing, rather than at sync time which requires a foreign join on metric type which cannot be 100% accurate at push time per series if the type of the metric changes for instance).
@tpaschalis is that how you had been approaching this or had you also been intending perhaps to remove the current metadata sync exchange (which is sent at 1min interval by default)?
I think @bboreham what @tpaschalis is pointing out that sending the metadata inline (which the whole point of this is as I understand is to make certain attributes available at push time and guaranteed to be available) has far lower network transfer than sending it in the "metadata" field that exists in protobuf today (since then labels for each series are written once, rather than twice, once for timeseries message and once for metadata message).
As far as I understand though, instead of changing any current metadata sync requests that are made or removing the metadata field in protobuf, it could be simply an additive change to also make metadata available inline on the timeseries message (and hence reduce the overall network effect made by this change).
Here's the original discussion on this approach:
#7771
Related design doc:
https://docs.google.com/document/d/1LY8Im8UyIBn8e3LJ2jB-MoajXkfAqW2eKzY735aYxqo/edit#heading=h.bik9uwphqy3g
Related mailing list discussion:
https://groups.google.com/g/prometheus-developers/c/w-eotGenBGg/m/UZWdxzcBBwAJ
Related stats (more on that PR link):
Here's the increase with a default config, the difference is relatively less measurable - likely to do with the previous config sending larger batches since the send deadline was longer with a max samples per send of 5,000:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I apologise, I was reacting to bd41c24, which includes a change to a field:
bd41c24#diff-bdaf80ebc5fa26365f45db53435b960ce623ea6f86747fb8870ad1abc355f64fR150
Now I see that field was only added in the context of this PR; it wouldn't itself be a breaking change.
I'm sorry I haven't read all the history but I don't understand why we are discussing metadata on a per-series level.
Prometheus doesn't ingest or store metadata that way; it's per-name or "metric family".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Elsewhere, people have pointed me at a conversation where metadata on a per-series level was discussed.
Please ignore me, I don't have the context to get involved.
…r to StoreSeries Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>
…a similar to StoreSeries" This reverts commit bd41c24.
Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>
…defaulting to the new one Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>
…tadata Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>
…ueueManager Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>
Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>
// The current MetadataWatcher implementation is mutually exclusive | ||
// with the new approach, which stores metadata as WAL records and | ||
// ships them alongside series. If both mechanisms are set, the new one | ||
// takes precedence by implicitly disabling the older one. | ||
if t.mcfg.Send && t.sendMetadata { | ||
level.Warn(logger).Log("msg", "the 'send_metadata' and 'metadata_config.send' parameters are mutually exclusive; defaulting to use 'send_metadata'") | ||
t.mcfg.Send = false | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initially I put this snippet into storage/remote/write.go and the ApplyConfig method but mutating the config there meant that the config hash was no longer deterministic, tripping up some tests and possibly users.
While the tests would be easy to fix, I felt that putting them one level deeper here might be better.
Hi @tpaschalis thanks for driving this feature! We're excited to see this land as well. I was wondering what the status of this feature is? Is there anything that I can help with to unblock this? I haven't been following the code closely, but I did notice that you were inlining the metadata into the TimeSeries. While this does introduce some additional overhead it makes discovering type data much easier. Having to essentially guess at the possible metric family from the series can problematic for a number of reasons that I am sure you are already aware of. |
storage/remote/queue_manager.go
Outdated
case tMetadata: | ||
pendingData[nPending].Metadata = prompb.Metadata{ | ||
Type: metricTypeToProtoEquivalent(d.metadata.Type), | ||
Help: d.metadata.Help, | ||
Unit: d.metadata.Unit, | ||
} | ||
nPendingMetadata++ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, so this wouldn't make metadata available inline for series with samples/exemplars/histograms/etc. It would be sent entirely separately in the array of timeseries
sent and the labels would be duplicated inside a request where samples/other were sent for a timeseries.
I thought this was a:
(1) A non-starter network transfer wise since would create another entry in the array with the same set of labels as an entry with samples or exemplars or histograms.
(B) Wouldn't guarantee metadata available when processing an entry with timeseries in any consumer of remote write due to batching/etc separating what is sent that's in the queue. I also believed that making this available as part of the same request for any given timeseries with samples/values sent was the primary reason to propagate it from the WAL rather than sync it every 1min (otherwise it would not be part of the request processing the timeseries with samples/values, similar to today's situation where it's not present since it comes during an entirely separate set of request/responses through an async 1min metadata sync push).
Currently it looks like the metadata record is enqueued separately and therefore would never have samples/exemplars/histograms/etc set for this entry in pendingData[nPending]
:
prometheus/storage/remote/queue_manager.go
Lines 817 to 828 in 208922b
if t.shards.enqueue(m.Ref, timeSeries{ | |
seriesLabels: lbls, | |
sType: tMetadata, | |
metadata: &metadata.Metadata{ | |
Type: record.ToTextparseMetricType(m.Type), | |
Help: m.Help, | |
Unit: m.Unit, | |
}, | |
}) { | |
continue outer | |
} | |
Previously with this working (in PR #7771) end to end instead we kept the metadata available then attached it to samples as they were enqueued:
prometheus/storage/remote/queue_manager.go
Lines 341 to 367 in b6e5bd1
meta, _ := t.seriesMetadata[s.Ref] | |
t.seriesMtx.Unlock() | |
// This will only loop if the queues are being resharded. | |
backoff := t.cfg.MinBackoff | |
for { | |
select { | |
case <-t.quit: | |
return false | |
default: | |
} | |
if t.shards.enqueue(s.Ref, sample{ | |
labels: lbls, | |
meta: meta, | |
t: s.T, | |
v: s.V, | |
}) { | |
continue outer | |
} | |
t.metrics.enqueueRetriesTotal.Inc() | |
time.Sleep(time.Duration(backoff)) | |
backoff = backoff * 2 | |
if backoff > t.cfg.MaxBackoff { | |
backoff = t.cfg.MaxBackoff | |
} | |
} |
Then this was set inline when the timeseries with samples for the entry was sent:
prometheus/storage/remote/queue_manager.go
Lines 864 to 866 in b6e5bd1
pendingSamples[nPending].Type = metricTypeToMetricTypeProto(sample.meta.Type) | |
pendingSamples[nPending].Unit = sample.meta.Unit | |
pendingSamples[nPending].Help = sample.meta.Help |
Sorry, haven't had time this week/next to follow up on my review due to on-call. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this!
I think, given this PR is trying to modify metadata handling in remote write, it's time to finalize our spec for Remote Write v1.1 in regards to metadata in some proposal (gdoc, or PR to https://github.com/prometheus/proposals) before we merge any changes to remote write proto.
If we would like to scope down this PR, we could just use WAL data for existing remote write protocol.
Otherwise, let's collab on some Remote Write spec's final decision. I loved the google group discussion Rob's started, as well as Rob's design document. On the different note the OTLP metrics is now on the way to be 1.0 and simply ignores (maybe for a good reason?) majority of edge cases we discussed (metadata different per series, efficiency considerations etc). Would love to research their point of view too, maybe we are overcomplicating things? 🤔 Or perhaps it's fair to match the storage edge cases and keep the metadata near series (or even samples?)
@@ -181,6 +181,9 @@ func (c *flagConfig) setFeatureListOptions(logger log.Logger) error { | |||
case "extra-scrape-metrics": | |||
c.scrape.ExtraMetrics = true | |||
level.Info(logger).Log("msg", "Experimental additional scrape metrics enabled") | |||
case "metadata-storage": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note for reviewers: This was removed on purpose with #11351
case "metadata-storage": | ||
c.scrape.EnableMetadataStorage = true | ||
level.Info(logger).Log("msg", "Experimental in-memory metadata storage enabled") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if the "SendWALMetadata" is not the ultimate feature flag for us then? Why bother second flag then if SendWALMetadata could control storage too (both has to be true always)?
@@ -857,6 +857,7 @@ type RemoteWriteConfig struct { | |||
Name string `yaml:"name,omitempty"` | |||
SendExemplars bool `yaml:"send_exemplars,omitempty"` | |||
SendNativeHistograms bool `yaml:"send_native_histograms,omitempty"` | |||
SendWALMetadata bool `yaml:"send_metadata,omitempty"` // TODO(@tpaschalis) Adding an extra field to enable us to remove the `metadata_config` struct in the future. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the rationale for removal of MetadataConfig? Any design doc/link to the discussion? It feels we might want to give more config options to metadata in future, depending on v1.1 spec we will came out with.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea behind this is that if we treat metadata as just another new type of datapoint in the WAL (like exemplars or native histograms) we wouldn't need any more specific configuration (so we wouldn't have a SendInterval or MaxPerSend option).
Do you have any specific ideas for config options we might need? We can kick off the v1.1 spec discussion and see if something comes up there.
@@ -127,6 +147,7 @@ message TimeSeries { | |||
repeated Sample samples = 2 [(gogoproto.nullable) = false]; | |||
repeated Exemplar exemplars = 3 [(gogoproto.nullable) = false]; | |||
repeated Histogram histograms = 4 [(gogoproto.nullable) = false]; | |||
Metadata metadata = 5 [(gogoproto.nullable) = false]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need design document with a team agreement on what's remote write v1.1 before we start modifying this 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initially I thought we wouldn't have to explicitly change the protocol since it was an extra optional signal (again like exemplars and histograms), if we think that's the way to go I'm up for it!
@cstyan also had some ideas about things that could go into a future v1.1 or v1.2 version of the protocol so we can discuss if we want to also bring any more changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update: Yeah, I'm also convinced this is the way to go! What would be the process to start things for drafting v1.1?
Getting consensus on the Google doc proposal and then opening a PR to the docs? Sending to the mailing list?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this!
I think, given this PR is trying to modify metadata handling in remote write, it's time to finalize our spec for Remote Write v1.1 in regards to metadata in some proposal (gdoc, or PR to https://github.com/prometheus/proposals) before we merge any changes to remote write proto.
If we would like to scope down this PR, we could just use WAL data for existing remote write protocol.
Otherwise, let's collab on some Remote Write spec's final decision. I loved the google group discussion Rob's started, as well as Rob's design document. On the different note the OTLP metrics is now on the way to be 1.0 and simply ignores (maybe for a good reason?) majority of edge cases we discussed (metadata different per series, efficiency considerations etc). Would love to research their point of view too, maybe we are overcomplicating things? 🤔 Or perhaps it's fair to match the storage edge cases and keep the metadata near series (or even samples?)
@bwplotka @cstyan @robskillington Thanks for your feedback. Apologies for the radio silence, I think we're ready for a push to try and get this over the line. I've tried to summarise the most pressing points in a Google document; it might help with opening a wider discussion and helping other people highlight alternatives. I think there are two main questions to address here. Second, how and how often metadata will be sent. The google doc outlines our options, but after some thought, Rob's comments and @beorn7's view in the original discussion, I'm leaning heavily into the proposal that we always send metadata along every sample. This is the most resource-intensive option, but it does guarantee that up-to-date metadata is available for any consumer, and is a step into making metadata a first-class citizen. I'll revert some recent code changes to go with that approach, and maybe we could gauge its impact with a prombench run. |
Apologies, hit Close instead of Comment by accident 🤦 |
…t alongside any sample Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>
Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>
Thanks for proposal. I think we need agreement on spec and to me it's similar amount of work to agreeing officially in this PR (it's network protocol after all). If we have even slight agreement on doc, we can start with experimental implementation first with some negotiation. .. and I love your proposal: Per series & send always is an amazing starting point. Feel free to propagate this document on |
Looking at this during the bug scrub. @tpaschalis With the recent progress WRT remote write 2.0, is this worth revisiting? |
Hey @beorn7. I'll haven't followed up with the latest progress; I'll check what happened. This was put on ice until the interning table was implemented so that we don't blow up people's egress bills; changing to use the interned format should be easy once it's there. |
Yep the intention is still to finish this off with the new interning format. |
@tpaschalis @cstyan Any update on the expected/approximate time-frame for completion of this PR please? |
closing since we merged this work into the remote write 2.0 feature branch |
What's the best way to follow the remote write 2.0 development? |
At the moment we're tracking remaining work here #6333 |
This PR is a follow-up to the work #10972 and #10312 to allow Metadata recorded in the WAL to be propagated through the remote_write.
In those previous PRs we prepared the WAL to be able to record metadata, as well as allowed metadata to be written to the WAL, without touching the MetadataWatcher. The functionality used to be hidden behind a feature flag, but the flag was removed until the remote_write part was wired in, since otherwise it would simply inflate WAL size without being of any use.
This first approach builds on the work of the Native Histograms, by adding another type of
seriesType
in the queue_manager for metadata as well as a*metadata.Metadata
field in the timeSeries struct.This new field is used to fill in a new slice when a shard is run, is decoded along the other seriesTypes, and passed along to the sendSamples method where it's used to build the write request with a new field in the TimeSeries protobuf definition.
TODO list
--metadata-storage
feature flag and a newsend_metadata
field in remote_write configs.