[CH] Fully Support writing parquet and mergetree in spark 3.5.x with delta protocol #7028
Open
Description
Description
This is umbrella issue.
Previously, #6705 is just a POC to prove that we can implemtent Delta Write based on CumnarWriteFilesExec
.
- [GLUTEN-7028][CH][Part-1] Using
PushingPipelineExecutor
to write merge tree #7029 - [GLUTEN-7028][CH][Part-2] Refactor: Move MergeTree related UT to mergetree module #7279
- [GLUTEN-7028][CH][Part-3] Refactor: Move mergetree related codes to backends-clickhouse #7234
- [GLUTEN-7028][CH][Part-4] Refactor
DeltaMergeTreeFileFormat
to read table configuration from deltalog's metadata #7170 - [GLUTEN-7028][CH][Part-5] Refactor: add NativeOutputWriter to unify CHDatasourceJniWrapper #7395
- [GLUTEN-7028][CH][Part-6] Introduce MergeTreeDelayedCommitProtocol #7506
- [GLUTEN-7028][CH][Part-7] Support one pipeline write for mergetree #7788
- [GLUTEN-7028][CH][Part-8] Support one pipeline write for partition mergetree #7924
- [GLUTEN-7028][CH][Part-9] Collecting Delta stats for parquet #7993
- [GLUTEN-7028][CH][Part-10] Collecting Delta stats for MergeTree #8029
- [GLUTEN-7028][CH][Part-11] Support write parquet files with bucket #8052
- [GLUTEN-7028][CH][Part-12] Add Local SortExec for Partition Write in one pipeline mode #8237
- [GLUTEN-7028][CH][Part-13] Support partition with escape value #8158
- [GLUTEN-7028][CH][Part-14] Refactor Case Sensitive Support for MergeTree #8346
- [GLUTEN-7028][CH][Part-15] [MINOR] Fix UTs #8364
backlog
- Collect file bytes that write pipeline produces.
- [CH] Incorrect result when native write timestamp column using spark 3.5 #8053
- Support
InsertIntoHiveTable
- Support Bucket table? or drop support
-
by Design => ([GLUTEN-7028][CH][Part-15] [MINOR] Fix UTs #8364)HDFS, HDFS with rockdb, S3
:: test mergetree write with the path based bucket table
-
- Support escape partition value. e.g. incluing space [GLUTEN-7028][CH][Part-13] Support partition with escape value #8158
- we need redesign how to support case (in)sensitive ([GLUTEN-7028][CH][Part-14] Refactor Case Sensitive Support for MergeTree #8346).
- Bug Fix ([GLUTEN-7028][CH][Part-15] [MINOR] Fix UTs #8364)
- test mergetree with partition with whitespace
-
GlutenClickHouseMergeTreeCacheDataSuite::test cache mergetree data no partition columns
-
GlutenClickHouseMergeTreePathBasedWriteSuite::test mergetree path based table update
andGlutenClickHouseMergeTreePathBasedWriteSuite::test mergetree path based table delete