Return booleans from expression comparisons, allow for vectors to be defined in expressions #1548
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR rewrites a good amount of our ExprTk integration so that comparisons such as
==
,<
,and
,or
etc. return boolean columns instead of floats. Originally, all comparisons returned floats because ExprTk treated booleans as the number 0 or 1, and passed them into thet_tscalar
constructor as ints and not booleans. Because scalar comparison between different types is not possible, functions had to return float values in order for conditionals to work.In this branch, I've added explicit specializations for more of ExprTk's processing code so that operators and conditional evaluators always return boolean scalars. In combination with the UI tweaks in #1547, expressions now can be easily used as filters on the dataset:
This also works well for defining ranges using
inrange
, such as a date range:Finally, users can define vectors inside expressions and use them/return scalars from the vector at will:
Vectors specifically enable a massive amount of features, including functions such as
find
andsplit
which need to return more than one value. Afind(string, regex, output_vector)
function, for example, will store its output ofstart_idx, end_idx
inoutput_vector
, and the user can then create a substring from those indices usingsubstr(output[0], output[1])
.The values
True
andFalse
have been added, replacing the valuestrue
andfalse
(without capital letters), which resolved to the numbers 1 and 0.True
andFalse
, meanwhile, resolve totrue
andfalse
boolean scalars, which means they can be used in comparisons against other booleans, whereas the old values will now result in a syntax error.Finally, a
boolean
function has been added to cast a scalar or column of any type into a boolean column, returning True if a value is set (including "falsy" values such as 0 and ""), and False for nulls.Numerous tests have been added, as always.