Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: distributed execution of compact statement #12750

Merged
merged 11 commits into from
Sep 18, 2023

Conversation

zhyass
Copy link
Member

@zhyass zhyass commented Sep 7, 2023

I hereby agree to the terms of the CLA available at: https://databend.rs/dev/policies/cla/

Summary

distributed execution of compact statement

mysql> create table t_compact_0 (a int not null) row_per_block=5 block_per_segment=5;
Query OK, 0 rows affected (0.15 sec)

mysql> insert into t_compact_0 select 50 - number from numbers(100);
Query OK, 100 rows affected (0.42 sec)

mysql> insert into t_compact_0 select 50 - number from numbers(100);
Query OK, 100 rows affected (0.41 sec)

mysql> insert into t_compact_0 select 50 - number from numbers(100);
Query OK, 100 rows affected (0.31 sec)

mysql> select count(),sum(a) from t_compact_0;
+---------+--------+
| count() | sum(a) |
+---------+--------+
|     300 |    150 |
+---------+--------+
1 row in set (0.33 sec)
Read 300 rows, 1.17 KiB in 0.264 sec., 1.14 thousand rows/sec., 4.43 KiB/sec.

mysql> alter table t_compact_0 set options(row_per_block=10,block_per_segment=10);
Query OK, 0 rows affected (0.06 sec)

# lazy compact
# The number of compact segments task is greater than the number of cluster nodes, 
# so will build compact blocks task during pipeline init.
mysql> explain pipeline optimize table t_compact_0 compact;
+-------------------------------------------------------------------------------------------+
| explain                                                                                   |
+-------------------------------------------------------------------------------------------+
| CommitSink × 1 processor                                                                  |
|   TransformMergeCommitMeta × 1 processor                                                  |
|     TransformExchangeDeserializer × 1 processor                                           |
|       Merge (DummyTransform × 3 processors) to (TransformExchangeDeserializer × 1)        |
|         Merge (MutationAggregator × 1 processor) to (Resize × 3)                          |
|           MutationAggregator × 1 processor                                                |
|             Merge (TransformSerializeBlock × 20 processors) to (MutationAggregator × 1)   |
|               TransformSerializeBlock × 20 processors                                     |
|                 CompactSource × 20 processors                                             |
+-------------------------------------------------------------------------------------------+
9 rows in set (0.27 sec)
Read 0 rows, 0.00 B in 0.243 sec., 0 rows/sec., 0.00 B/sec.

mysql> optimize table t_compact_0 compact;
Query OK, 0 rows affected (0.31 sec)

mysql> select segment_count, block_count, row_count from fuse_snapshot('default', 't_compact_0') limit 2;
+---------------+-------------+-----------+
| segment_count | block_count | row_count |
+---------------+-------------+-----------+
|             3 |          30 |       300 |
|             6 |          30 |       300 |
+---------------+-------------+-----------+
2 rows in set (0.09 sec)
Read 2 rows, 428.00 B in 0.055 sec., 36.58 rows/sec., 7.64 KiB/sec.
mysql> create table t_compact_1 (a int not null) row_per_block=5 block_per_segment=5;
Query OK, 0 rows affected (0.14 sec)

mysql> insert into t_compact_1 select 100 - number from numbers(150);
Query OK, 150 rows affected (0.42 sec)

mysql> alter table t_compact_1 set options(row_per_block=10,block_per_segment=15);
Query OK, 0 rows affected (0.11 sec)

# nolazy compact
# The number of compact segments task is less than the number of cluster nodes, 
# so will build compact blocks task before execute pipeline
mysql> explain pipeline optimize table t_compact_1 compact;
+----------------------------------------------------------------------------------------+
| explain                                                                                |
+----------------------------------------------------------------------------------------+
| CommitSink × 1 processor                                                               |
|   MutationAggregator × 1 processor                                                     |
|     Merge (TransformExchangeDeserializer × 6 processors) to (MutationAggregator × 1)   |
|       TransformExchangeDeserializer × 6 processors                                     |
|         Merge (DummyTransform × 8 processors) to (TransformExchangeDeserializer × 6)   |
|           Merge (TransformSerializeBlock × 6 processors) to (Resize × 8)               |
|             TransformSerializeBlock × 6 processors                                     |
|               CompactSource × 6 processors                                             |
+----------------------------------------------------------------------------------------+
8 rows in set (0.17 sec)
Read 0 rows, 0.00 B in 0.159 sec., 0 rows/sec., 0.00 B/sec.

mysql> optimize table t_compact_1 compact;
Query OK, 150 rows affected (0.34 sec)

mysql> select segment_count, block_count, row_count from fuse_snapshot('default', 't_compact_1') limit 2;
+---------------+-------------+-----------+
| segment_count | block_count | row_count |
+---------------+-------------+-----------+
|             2 |          14 |       150 |
|             6 |          30 |       150 |
+---------------+-------------+-----------+
2 rows in set (0.09 sec)
Read 2 rows, 396.00 B in 0.055 sec., 36.15 rows/sec., 6.99 KiB/sec.
  • Closes #issue

This change is Reviewable

@vercel
Copy link

vercel bot commented Sep 7, 2023

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Ignored Deployment
Name Status Preview Comments Updated (UTC)
databend ⬜️ Ignored (Inspect) Visit Preview Sep 18, 2023 4:30pm

@zhyass zhyass marked this pull request as draft September 7, 2023 19:15
@github-actions github-actions bot added the pr-feature this PR introduces a new feature to the codebase label Sep 7, 2023
@zhyass zhyass force-pushed the fix_purge branch 2 times, most recently from 32265b8 to 5b03e1a Compare September 8, 2023 09:47
@zhyass
Copy link
Member Author

zhyass commented Sep 8, 2023

askbend:summary

@databend-bot
Copy link
Member

PR Summary(By llmchain.rs):

  • Refactoring of Compact and Mutation Operations

    • The compact and mutation operations in the code have been significantly refactored. The CompactTarget and MutationAggregate classes have been replaced with CommitSink and CompactPartial respectively. This change was made to improve the efficiency and readability of the code.
  • Introduction of New Structs and Enums

    • Several new structs and enums have been introduced, such as CompactExtraInfo, CompactTaskInfo, CompactLazyPartInfo, SerializeBlock, and DeletedSegmentInfo. These new data structures provide a more organized and efficient way to handle compact and mutation operations.
  • Changes in Import Statements

    • The import statements throughout the code have been updated to reflect the changes in the code structure. Some modules have been removed from the import list, while others have been added. This was necessary due to the introduction of new classes and the removal of old ones.
  • Modification of Existing Functions

    • Many existing functions have been modified to accommodate the changes in the code structure. For example, the compact function has been renamed to compact_segments, and the target and pipeline parameters have been removed. Similarly, the build_pipeline method has been modified to handle different compact targets.
  • Removal of Redundant Code

    • Some redundant code has been removed from the codebase. For example, the MutationDeletedSegment struct and its associated methods have been removed. This was done to simplify the code and improve its efficiency.

@zhyass zhyass marked this pull request as ready for review September 8, 2023 13:39
@zhyass zhyass requested review from SkyFan2002, dantengsky and JackTan25 and removed request for SkyFan2002 and dantengsky September 8, 2023 13:40
@zhyass zhyass added the ci-benchmark Benchmark: run all test label Sep 8, 2023
@zhyass zhyass marked this pull request as draft September 8, 2023 14:40
@databendlabs databendlabs deleted a comment from github-actions bot Sep 8, 2023
@zhyass zhyass marked this pull request as ready for review September 8, 2023 16:03
@zhyass zhyass added ci-benchmark Benchmark: run all test and removed ci-benchmark Benchmark: run all test labels Sep 8, 2023
@databendlabs databendlabs deleted a comment from github-actions bot Sep 9, 2023
@zhyass zhyass force-pushed the fix_purge branch 2 times, most recently from 7c2f2a9 to 913c80a Compare September 9, 2023 03:21
@lichuang lichuang self-requested a review September 9, 2023 03:46
@zhyass zhyass added ci-benchmark Benchmark: run all test and removed ci-benchmark Benchmark: run all test labels Sep 9, 2023
@github-actions
Copy link
Contributor

github-actions bot commented Sep 9, 2023

Docker Image for PR

  • tag: pr-12750-f92f128

note: this image tag is only available for internal use,
please check the internal doc for more details.

@SkyFan2002
Copy link
Member

LGTM

@zhyass zhyass added ci-benchmark Benchmark: run all test and removed ci-benchmark Benchmark: run all test labels Sep 11, 2023
@github-actions
Copy link
Contributor

Docker Image for PR

  • tag: pr-12750-20d6d99

note: this image tag is only available for internal use,
please check the internal doc for more details.

@zhyass zhyass added ci-benchmark Benchmark: run all test and removed ci-benchmark Benchmark: run all test labels Sep 12, 2023
@github-actions
Copy link
Contributor

Docker Image for PR

  • tag: pr-12750-392fc23

note: this image tag is only available for internal use,
please check the internal doc for more details.

@dantengsky dantengsky merged commit 7495817 into databendlabs:main Sep 18, 2023
@BohuTANG BohuTANG mentioned this pull request Sep 27, 2023
8 tasks
andylokandy pushed a commit to andylokandy/databend that referenced this pull request Nov 27, 2023
* compact distribute

* remove unused codes

* update test case

* add sqllogic test

* fix test

---------

Co-authored-by: dantengsky <dantengsky@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci-benchmark Benchmark: run all test pr-feature this PR introduces a new feature to the codebase
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants