-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix inconsistent array type for binary numerical operators result between array and scalar #6269
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that that queries that used to fail pass with this change I think it is a step forward in the right direction.
As I understand it this code will force the output of an expression back to non dictionary at query time.
Now that some of the kernels actually support dictionary encoded arrays natively, I wonder if it would be possible to maintain the dictionary encoding as part of the coercion rules (rather than coercing the output to a primitive type)?
Thank you @viirya |
For mathematics numerical kernels, the returned type of two dictionary input arrays is primitive array. So for |
Yeah, I guess I was thinking it would nice to avoid the unpacking of the dictionary result into a primitive array (when possible) |
I meant, for mathematics numerical kernels (e.g. add, minus etc.), the result of operation between two dictionary arrays is primitive array. We don't unpack dictionary array into primitive array. This is why the coercion rule specifies the result type of such op as primitive type instead of dictionary of it. But for such op between dictionary and a scalar, the result is dictionary array as for such op it can simply apply on dictionary values which is not the same for above case (dictionary and dictionary). So the inconsistency (primitive for dictionary/dictionary and dictionary for dictionary/scalar) leads to the bug we saw. We can either changing primitive result of op on dictionary/dictionary to dictionary, or changing dictionary result of op on dictionary/scalar to primitive. This takes the later one as a fix. One reason is that this is simply to apply to fix the issue now and has less impact on performance I think. Another reason is that I'm not sure packing op result of dictionary/dictionary as dictionary making sense. It is doable but considering dictionary encoding during mathematics numerical op, it might be introducing performance penalty. I'll find some time trying that. |
I agree then that this solution makes sense |
Thanks @alamb. I will find some time looking at the possibility to packing primitive result of math kernels on dictionary/dictionary as dictionary. |
Which issue does this PR close?
Closes #6243.
Rationale for this change
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?