Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge Datavalues-dev into main #4074

Merged
merged 164 commits into from
Feb 9, 2022
Merged

Merge Datavalues-dev into main #4074

merged 164 commits into from
Feb 9, 2022

Conversation

sundy-li
Copy link
Member

@sundy-li sundy-li commented Feb 8, 2022

I hereby agree to the terms of the CLA available at: https://databend.rs/dev/policies/cla/

Summary

Summary about this PR

Changelog

  • Improvement

Related Issues

Fixes #issue

Test Plan

Unit Tests

Stateless Tests

sundy-li and others added 30 commits January 18, 2022 16:08
[datavalues-dev] Add Column builder
Complete array and struct serializer
Make some udf functions to work with datavalues2
pub fn serialize(column: &ColumnRef, vec: &mut Vec<Vec<u8>>) -> Result<()> {
pub fn serialize(column: &ColumnRef, vec: &mut [Vec<u8>]) -> Result<()> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for clippy::ptr_arg, may need to check rust-lang/rust-clippy#8334

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can try to merge it first and then modify it if there is indeed a performance problem.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also found that!!!

@codecov-commenter
Copy link

codecov-commenter commented Feb 8, 2022

Codecov Report

Merging #4074 (db55a3b) into main (d70ad52) will decrease coverage by 0%.
The diff coverage is 59%.

Impacted file tree graph

@@           Coverage Diff           @@
##            main   #4074     +/-   ##
=======================================
- Coverage     57%     57%     -1%     
=======================================
  Files        831     853     +22     
  Lines      44293   46402   +2109     
=======================================
+ Hits       25521   26515    +994     
- Misses     18772   19887   +1115     
Impacted Files Coverage Δ
common/clickhouse-srv/src/types/column/mod.rs 87% <ø> (ø)
common/clickhouse-srv/src/types/mod.rs 0% <ø> (ø)
common/codegen/src/writes/arithmetics_type.rs 0% <0%> (ø)
common/datablocks/src/kernels/data_block_slice.rs 100% <ø> (ø)
common/datavalues/src/arrays/ops/vec_hash.rs 0% <ø> (-70%) ⬇️
common/datavalues/src/data_field.rs 61% <0%> (-6%) ⬇️
common/datavalues/src/types/serializations/mod.rs 54% <0%> (-3%) ⬇️
common/datavalues2/src/columns/boolean/iterator.rs 100% <ø> (ø)
common/datavalues2/src/columns/mutable.rs 0% <0%> (ø)
common/datavalues2/src/columns/null/mutable.rs 0% <0%> (ø)
... and 428 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d70ad52...db55a3b. Read the comment docs.

@BohuTANG
Copy link
Member

BohuTANG commented Feb 9, 2022

💯
I think we need to do a performance test(like https://databend.rs/overview/performance) on this branch, to make sure there is no performance degradation.

@sundy-li
Copy link
Member Author

sundy-li commented Feb 9, 2022

I think we need to do a performance test(like https://databend.rs/overview/performance) on this branch, to make sure there is no performance degradation.

Yes, I'm working on it.

@sundy-li
Copy link
Member Author

sundy-li commented Feb 9, 2022

Benchmark script: https://gist.github.com/sundy-li/b8739b28cc493070392b0602272b5fef
Env: ArchLinux AMD Ryzen 9 5950X 16-Core Processor
WARM TIMES: 3
RUN TIMES: 10

Main Branch:

Command Mean [s] Min [s] Max [s] Relative
clickhouse-client --host 127.0.0.1 --port 9000 --query="SELECT avg(number) FROM numbers_mt(100000000000)" 1.696 ± 0.042 1.642 1.791 33.43 ± 1.28
clickhouse-client --host 127.0.0.1 --port 9000 --query="SELECT sum(number) FROM numbers_mt(100000000000)" 1.714 ± 0.038 1.670 1.791 33.79 ± 1.23
clickhouse-client --host 127.0.0.1 --port 9000 --query="SELECT min(number) FROM numbers_mt(100000000000)" 3.101 ± 0.020 3.083 3.142 61.14 ± 1.82
clickhouse-client --host 127.0.0.1 --port 9000 --query="SELECT max(number) FROM numbers_mt(100000000000)" 2.755 ± 0.036 2.722 2.812 54.33 ± 1.72
clickhouse-client --host 127.0.0.1 --port 9000 --query="SELECT count(number) FROM numbers_mt(100000000000)" 1.211 ± 0.021 1.192 1.253 23.87 ± 0.80
clickhouse-client --host 127.0.0.1 --port 9000 --query="SELECT sum(number+number+number) FROM numbers_mt(100000000000)" 6.565 ± 0.036 6.502 6.624 129.45 ± 3.82
clickhouse-client --host 127.0.0.1 --port 9000 --query="SELECT sum(number) / count(number) FROM numbers_mt(100000000000)" 1.783 ± 0.029 1.732 1.823 35.17 ± 1.16
clickhouse-client --host 127.0.0.1 --port 9000 --query="SELECT sum(number) / count(number), max(number), min(number) FROM numbers_mt(100000000000)" 5.296 ± 0.019 5.270 5.332 104.43 ± 3.05
clickhouse-client --host 127.0.0.1 --port 9000 --query="SELECT number FROM numbers_mt(10000000000) ORDER BY number DESC LIMIT 10" 0.051 ± 0.001 0.049 0.053 1.00
clickhouse-client --host 127.0.0.1 --port 9000 --query="SELECT max(number), sum(number) FROM numbers_mt(1000000000) GROUP BY number % 3, number % 4, number % 5 LIMIT 10" 0.599 ± 0.013 0.591 0.623 11.81 ± 0.42

datavalues-dev Branch:

Command Mean [s] Min [s] Max [s] Relative
clickhouse-client --host 127.0.0.1 --port 9000 --query="SELECT avg(number) FROM numbers_mt(100000000000)" 1.660 ± 0.021 1.641 1.701 33.14 ± 0.73
clickhouse-client --host 127.0.0.1 --port 9000 --query="SELECT sum(number) FROM numbers_mt(100000000000)" 1.656 ± 0.033 1.610 1.702 33.08 ± 0.89
clickhouse-client --host 127.0.0.1 --port 9000 --query="SELECT min(number) FROM numbers_mt(100000000000)" 2.967 ± 0.034 2.931 3.051 59.26 ± 1.28
clickhouse-client --host 127.0.0.1 --port 9000 --query="SELECT max(number) FROM numbers_mt(100000000000)" 2.803 ± 0.089 2.722 3.023 55.97 ± 2.05
clickhouse-client --host 127.0.0.1 --port 9000 --query="SELECT count(number) FROM numbers_mt(100000000000)" 1.219 ± 0.071 1.134 1.372 24.35 ± 1.49
clickhouse-client --host 127.0.0.1 --port 9000 --query="SELECT sum(number+number+number) FROM numbers_mt(100000000000)" 13.174 ± 0.081 13.102 13.371 263.08 ± 5.07
clickhouse-client --host 127.0.0.1 --port 9000 --query="SELECT sum(number) / count(number) FROM numbers_mt(100000000000)" 1.699 ± 0.033 1.642 1.761 33.93 ± 0.90
clickhouse-client --host 127.0.0.1 --port 9000 --query="SELECT sum(number) / count(number), max(number), min(number) FROM numbers_mt(100000000000)" 5.242 ± 0.020 5.212 5.274 104.69 ± 1.96
clickhouse-client --host 127.0.0.1 --port 9000 --query="SELECT number FROM numbers_mt(10000000000) ORDER BY number DESC LIMIT 10" 0.050 ± 0.001 0.048 0.051 1.00
clickhouse-client --host 127.0.0.1 --port 9000 --query="SELECT max(number), sum(number) FROM numbers_mt(1000000000) GROUP BY number % 3, number % 4, number % 5 LIMIT 10" 0.597 ± 0.010 0.589 0.623 11.92 ± 0.29

@sundy-li
Copy link
Member Author

sundy-li commented Feb 9, 2022

Only

clickhouse-client --host 127.0.0.1 --port 9000 --query="SELECT sum(number+number+number) FROM numbers_mt(100000000000)"

has performance drawback.

Update:

Fixed

1 rows in set. Elapsed: 6.092 sec. Processed 100.00 billion rows, 800.00 GB (16.41 billion rows/s., 131.31 GB/s.)

@sundy-li sundy-li marked this pull request as ready for review February 9, 2022 02:47
@sundy-li sundy-li requested a review from BohuTANG as a code owner February 9, 2022 02:47
@databend-bot
Copy link
Member

Wait for another reviewer approval

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.