Skip to content

The intermediate type of collect_list/collect_set isn't compatible with Spark #12023

Open
@NEUpanning

Description

Bug description

The intermediate type of Spark's collect_list/collect_set is BINARY. The intermediate data type for Velox's collect_list/collect_set is ARRAY, which is incompatible with BINARY. The current workaround implemented in Gluten incurs some issues, which include gluten#8227 and gluten#8184.

A complete solution involves changing the intermediate type of Velox's collec_list/collect_set to VARBINARY and using UnsafeArrayData format to do the SerDe.

System information

/

Relevant logs

No response

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingtriageNewly created issue that needs attention.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions