Fix inconsistent array type for binary numerical operators result between array and scalar #6269

viirya · 2023-05-06T22:57:00Z

Which issue does this PR close?

Closes #6243.

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

…imitive array

alamb

Given that that queries that used to fail pass with this change I think it is a step forward in the right direction.

As I understand it this code will force the output of an expression back to non dictionary at query time.

Now that some of the kernels actually support dictionary encoded arrays natively, I wonder if it would be possible to maintain the dictionary encoding as part of the coercion rules (rather than coercing the output to a primitive type)?

https://github.com/apache/arrow-datafusion/blob/2e9beeba01b85afb6d4f6557201e673008ea9edd/datafusion/expr/src/type_coercion/binary.rs#L475-L483

alamb · 2023-05-08T16:21:53Z

Thank you @viirya

viirya · 2023-05-08T17:24:53Z

Now that some of the kernels actually support dictionary encoded arrays natively, I wonder if it would be possible to maintain the dictionary encoding as part of the coercion rules (rather than coercing the output to a primitive type)?

For mathematics numerical kernels, the returned type of two dictionary input arrays is primitive array. So for mathematics_numerical_coercion, this looks still correct.

alamb · 2023-05-08T19:00:47Z

For mathematics numerical kernels, the returned type of two dictionary input arrays is primitive array. So for mathematics_numerical_coercion, this looks still correct.

Yeah, I guess I was thinking it would nice to avoid the unpacking of the dictionary result into a primitive array (when possible)

viirya · 2023-05-08T21:07:27Z

Yeah, I guess I was thinking it would nice to avoid the unpacking of the dictionary result into a primitive array (when possible)

I meant, for mathematics numerical kernels (e.g. add, minus etc.), the result of operation between two dictionary arrays is primitive array. We don't unpack dictionary array into primitive array. This is why the coercion rule specifies the result type of such op as primitive type instead of dictionary of it.

But for such op between dictionary and a scalar, the result is dictionary array as for such op it can simply apply on dictionary values which is not the same for above case (dictionary and dictionary). So the inconsistency (primitive for dictionary/dictionary and dictionary for dictionary/scalar) leads to the bug we saw.

We can either changing primitive result of op on dictionary/dictionary to dictionary, or changing dictionary result of op on dictionary/scalar to primitive. This takes the later one as a fix. One reason is that this is simply to apply to fix the issue now and has less impact on performance I think. Another reason is that I'm not sure packing op result of dictionary/dictionary as dictionary making sense. It is doable but considering dictionary encoding during mathematics numerical op, it might be introducing performance penalty. I'll find some time trying that.

alamb · 2023-05-09T10:01:33Z

I agree then that this solution makes sense

viirya · 2023-05-09T18:10:11Z

Thanks @alamb. I will find some time looking at the possibility to packing primitive result of math kernels on dictionary/dictionary as dictionary.

Cast binary numerical operators result between array and scalar to pr…

e1cb297

…imitive array

github-actions bot added core Core DataFusion crate physical-expr Physical Expressions sqllogictest SQL Logic Tests (.slt) labels May 6, 2023

viirya changed the title ~~Cast binary numerical operators result between array and scalar to primitive array~~ Fix inconsistent array type for binary numerical operators result between array and scalar May 6, 2023

viirya added 3 commits May 6, 2023 16:27

Add order by to stablize query result

f187516

Fix tests

d2bc015

Fix clippy

50e2f72

alamb approved these changes May 8, 2023

View reviewed changes

alamb mentioned this pull request May 8, 2023

minor: Remove dead code for casting dictionaries #6286

Merged

viirya merged commit 1dd3674 into apache:main May 9, 2023

viirya mentioned this pull request May 25, 2023

Skip casting result array for binary numerical operators result between array and scalar if possible #6438

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix inconsistent array type for binary numerical operators result between array and scalar #6269

Fix inconsistent array type for binary numerical operators result between array and scalar #6269

viirya commented May 6, 2023

alamb left a comment

alamb commented May 8, 2023

viirya commented May 8, 2023

alamb commented May 8, 2023

viirya commented May 8, 2023 •

edited

Loading

alamb commented May 9, 2023

viirya commented May 9, 2023

Fix inconsistent array type for binary numerical operators result between array and scalar #6269

Fix inconsistent array type for binary numerical operators result between array and scalar #6269

Conversation

viirya commented May 6, 2023

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

alamb left a comment

Choose a reason for hiding this comment

alamb commented May 8, 2023

viirya commented May 8, 2023

alamb commented May 8, 2023

viirya commented May 8, 2023 • edited Loading

alamb commented May 9, 2023

viirya commented May 9, 2023

viirya commented May 8, 2023 •

edited

Loading