-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ES|QL change point #119458
base: main
Are you sure you want to change the base?
ES|QL change point #119458
Conversation
Hi @jan-elastic, I've created a changelog YAML for you. |
c699481
to
6980738
Compare
Hi @jan-elastic, I've created a changelog YAML for you. |
d4fa597
to
4ace296
Compare
4ace296
to
7e9a3a5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd be happy to get this in soon and iterate around what we have. I'm super interested in how to make this interact nicely with time series data.
I think @martijnvg should probably have a look, because he's been working on time series aggregations, and this is certainly one of them.
For anyone following along - this really wants data in time series order. So buffers it all up and puts it in that order. In TSDB we can read data in time series order strait from the disk. No buffering required.
Maybe important question - do we need the @timestamp
to be passed to the agg if the data was guaranteed to be in sorted order?
builder.beginControlFlow("if (timestampsVector == null) "); | ||
builder.addStatement("throw new IllegalStateException($S)", "expected @timestamp vector; but got a block"); | ||
builder.endControlFlow(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First arity 2 aggregation function!
I suppose it's ok to make it timestamp-specific at this point. At some point we'll rework this when we more correlation or something, but it's all good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The RATE
aggregation already takes two args (also timestamp also 2nd arg), but that's still in snapshot mode. Unfortunately, that's just a GroupingAggregator
. So, I basically ported the includeTimestamps
from that.
builder.addStatement("$T timestampsVector = timestampsBlock.asVector()", LONG_VECTOR); | ||
builder.beginControlFlow("if (timestampsVector == null) "); | ||
builder.addStatement("throw new IllegalStateException($S)", "expected @timestamp vector; but got a block"); | ||
builder.endControlFlow(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is fine. But it's worth saying out loud: Things like this usually should become null
and a warning. They'd put the agg in an "I'm broken" state and produce a warning on output. Like sum
should do if it overflows. It doesn't do that now, but it should.
Anyway, I think this is fine to just hard fail here - just like this - because we're going to want to build machinery around the agg to make sure that it's input is a time series. Which will have the constraint that the timestamp
is always single valued. And, probably, descending.
add(timestamps, values, 0); | ||
} | ||
|
||
void add(LongBlock timestamps, DoubleBlock values, int otherPosition) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LongVector
, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know exactly what needs to be changed for that.
The AggregatorImplementer
generates something that wants this method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. I'd be ok leaving a TODO on that one.
private final BigArrays bigArrays; | ||
private int count; | ||
private LongArray timestamps; | ||
private DoubleArray values; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks to me like you are doing this assuming the timestamps are in any order. That's how it works now and probably what we should merge today, but I'm hopeful we can make a system where you can rely on them being in ascending order. That's mostly the machinery we've talked about for time series systems.
I'm not sure that this code will survive forever - we may add the "in sorted order" path and never use this after that. OTOH, let's get something that works in today. If we change it then we change it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, that's the current assumption.
I'm totally fine with this code getting trashed in the future, once we have better time series support.
|
||
// TODO: this needs to output multiple columns or a composite object, not a JSON blob. | ||
private BytesRef getChangePoint() { | ||
// TODO: probably reuse ES|QL sort/orderBy to get results in order |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is partly what I mean by "make this a time series". ESQL would stick a sort before the aggregation operation. Or, well, iterate the documents in time series order. Which it totally can do for time series indices.
// TODO: this needs to output multiple columns or a composite object, not a JSON blob. | ||
private BytesRef getChangePoint() { | ||
// TODO: probably reuse ES|QL sort/orderBy to get results in order | ||
// TODO: this copying/sorting doesn't account for memory |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it super doesn't. If we're going to keep this code forever I'd prefer to make LongArray
has a sortBetween
method that sorts in place. Or something like that.
This PR is adding the Change point aggregation to ES|QL.
Note: the current output format is a JSON string, which looks like:
This should be replaced by multiple columns or so, but that still needs some discussion.