Open
Description
Description
Query trace is a very useful feature, but I meets some exceptions when I try to enable it in Gluten.
- Gluten
QueryCtx
queryId is empty""
, so the generated directory missed the queryId layer which must be set in query trace. Since Gluten uses single thread execution and auto incremental vid , so the taskId is enough to distinguish the velox plan.
# TaskId
static std::atomic<uint32_t> vtId{0}; // Velox task ID to distinguish from Spark task ID.
task_ = velox::exec::Task::create(
fmt::format(
"Gluten_Stage_{}_TID_{}_VTID_{}",
std::to_string(taskInfo_.stageId),
std::to_string(taskInfo_.taskId),
std::to_string(vtId++)),
std::move(planFragment),
0,
std::move(queryCtx),
velox::exec::Task::ExecutionMode::kSerial);
# queryId is ""
std::shared_ptr<velox::core::QueryCtx> ctx = velox::core::QueryCtx::create(
nullptr,
facebook::velox::core::QueryConfig{getQueryContextConf()},
connectorConfigs,
gluten::VeloxBackend::get()->getAsyncDataCache(),
memoryManager_->getAggregateMemoryPool(),
spillExecutor_.get(),
"");
Generated query trace directory.
/tmp/query_trace/
└── Gluten_Stage_0_TID_0_VTID_0
├── 7
│ └── 0
│ └── 0
│ ├── op_input_trace.data
│ └── op_trace_summary.json
└── task_trace_meta.json
Receives the exception.
/mnt/DP_disk1/code/velox/build/velox/tool/trace# ./velox_query_replayer --root_dir /tmp/query_trace --task_id Gluten_Stage_0_TID_0_VTID_0 --summary
terminate called after throwing an instance of 'facebook::velox::VeloxUserError'
what(): Exception: VeloxUserError
Error Source: USER
Error Code: INVALID_ARGUMENT
Reason: --query_id must be provided
Retriable: False
Expression: !FLAGS_query_id.empty()
Function: init
File: /mnt/DP_disk1/code/velox/velox/tool/trace/TraceReplayRunner.cpp
Line: 241
Stack trace:
Stack trace has been disabled. Use --velox_exception_user_stacktrace_enabled=true to enable it.
Aborted (core dumped)
Since QueryCtx does not requires the queryId to be set, so I think the empty queryId is reasonable, so we need to support it in QueryTrace.
- Register the Spark functions and distinguish from Presto functions by FLAGS_xx, we cannot register both of them because the functions overwrite may trigger some unexpected behavior.
- Spark ValueStreamNode is hard to serialize and deserialize, we may not need to serialize the total plan, extract only the node required to serialize.