Spark supports query trace

### Description

Query trace is a very useful feature, but I meets some exceptions when I try to enable it in Gluten.
1. Gluten `QueryCtx` queryId is empty `""`, so the generated directory missed the queryId layer which must be set in query trace. Since Gluten uses single thread execution and auto incremental vid , so the taskId is enough to distinguish the velox plan.
```
# TaskId
static std::atomic<uint32_t> vtId{0}; // Velox task ID to distinguish from Spark task ID.
  task_ = velox::exec::Task::create(
      fmt::format(
          "Gluten_Stage_{}_TID_{}_VTID_{}",
          std::to_string(taskInfo_.stageId),
          std::to_string(taskInfo_.taskId),
          std::to_string(vtId++)),
      std::move(planFragment),
      0,
      std::move(queryCtx),
      velox::exec::Task::ExecutionMode::kSerial);
```

```
# queryId is ""
std::shared_ptr<velox::core::QueryCtx> ctx = velox::core::QueryCtx::create(
      nullptr,
      facebook::velox::core::QueryConfig{getQueryContextConf()},
      connectorConfigs,
      gluten::VeloxBackend::get()->getAsyncDataCache(),
      memoryManager_->getAggregateMemoryPool(),
      spillExecutor_.get(),
      "");
```
Generated query trace directory.
```
/tmp/query_trace/
└── Gluten_Stage_0_TID_0_VTID_0
    ├── 7
    │   └── 0
    │       └── 0
    │           ├── op_input_trace.data
    │           └── op_trace_summary.json
    └── task_trace_meta.json
```
Receives the exception.
```
/mnt/DP_disk1/code/velox/build/velox/tool/trace# ./velox_query_replayer  --root_dir /tmp/query_trace --task_id Gluten_Stage_0_TID_0_VTID_0 --summary
terminate called after throwing an instance of 'facebook::velox::VeloxUserError'
  what():  Exception: VeloxUserError
Error Source: USER
Error Code: INVALID_ARGUMENT
Reason: --query_id must be provided
Retriable: False
Expression: !FLAGS_query_id.empty()
Function: init
File: /mnt/DP_disk1/code/velox/velox/tool/trace/TraceReplayRunner.cpp
Line: 241
Stack trace:
Stack trace has been disabled. Use --velox_exception_user_stacktrace_enabled=true to enable it.

Aborted (core dumped)

```
Since QueryCtx does not requires the queryId to be set, so I think the empty queryId is reasonable, so we need to support it in QueryTrace.

2. Register the Spark functions and distinguish from Presto functions by FLAGS_xx, we cannot register both of them because the functions overwrite may trigger some unexpected behavior. 
3. Spark ValueStreamNode is hard to serialize and deserialize, we may not need to serialize the total plan, extract only the node required to serialize.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spark supports query trace #12084

Description

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Spark supports query trace #12084

Description

Description

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions