Fix for aggregate functions applied to empty sets #964
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Current behaviour
When you call most of the built in aggregate functions (
STDEV
,VAR
etc.) on an empty results set (e.g.SELECT STDDEV(col) FROM table WHERE 1=0
) an error is thrown, as detailed in #917.Expected behaviour
No error should be thrown and
undefined
(ornull
?) should be returned.Cause
An aggregate function has 3 stages (as detailed here), stage 1 for the first value, stage 2 for each subsequent value, and stage 3 to perform any last transforms on the accumulator. The problem is that with an empty set, we skip straight to stage 3 - but the accumulator will always be
undefined
, because it hasn't been set to anything in stages 1/2 (which won't have run, since there's no first value or subsequent values) which usually leads to errors.Fix
I propose we can wrap all the aggregate functions in an outer functions which checks whether the accumulator is☺️ .
undefined
at the post-processing state and then just short-circuits (by returningundefined
). An alternative would be to add that condition to all the affected aggregate functions, but figured this approach might be simplerThe assumption is that user-defined aggregators would handle this case themselves (might be worth mentioning in the docs?).