The intermediate type of collect_list/collect_set isn't compatible with Spark #12023
Open
Description
Bug description
The intermediate type of Spark's collect_list/collect_set is BINARY. The intermediate data type for Velox's collect_list/collect_set is ARRAY, which is incompatible with BINARY. The current workaround implemented in Gluten incurs some issues, which include gluten#8227 and gluten#8184.
A complete solution involves changing the intermediate type of Velox's collec_list/collect_set to VARBINARY and using UnsafeArrayData format to do the SerDe.
System information
/
Relevant logs
No response