Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Compression] ALP Compression (float/double) #9635

Merged
merged 47 commits into from
Jan 22, 2024
Merged
Changes from 1 commit
Commits
Show all changes
47 commits
Select commit Hold shift + click to select a range
960cffa
[Compression] ALP First Unit Tests (Double, Float and Negative Numbers)
Aug 25, 2023
a9d436d
[Compression] Base ALP for Doubles and Floats
Aug 25, 2023
67e2249
[Compression] ALP as a type of compression
Aug 25, 2023
823f038
[Compression] A first version of ALP
Aug 30, 2023
70e25e8
[ALPRD] First implementation
Oct 21, 2023
c3f20e5
[ALPRD] First implementation
Oct 21, 2023
4e7f04a
[ALPRD] First implementation
Oct 21, 2023
53295f5
[ALP] Fixing sampling bug and DoubleToInt64 function
Oct 21, 2023
6da081e
[ALP] Fixing TODOs and removing code smells
Oct 21, 2023
f9d1a10
[ALP RD] Prettyfying code
Oct 23, 2023
982eb1a
[ALP] Prettyfying code
Oct 23, 2023
ee588ad
[Alp] Prettyfying code and fixing bad smells
Oct 23, 2023
54ba0c8
[Alp] Adding some tests
Oct 23, 2023
824cd40
[ALP] Adding Patas tests to ALP
Oct 24, 2023
980230c
[ALPRD] Adding Patas tests to ALPRD
Oct 24, 2023
942d6e7
[ALP] Skipping CI for now [ci skip]
Oct 24, 2023
2bf0b9d
Merge branch 'duckdb:main' into alp_compression
Oct 24, 2023
2963bd6
[ALP] Fixing clang-format
Oct 24, 2023
19643c6
[ALP] Fixing clang-format
Oct 24, 2023
b352160
[ALP] Fixing undefined references on Release version
Oct 24, 2023
28864dd
[ALP] Fixing math functions
Oct 24, 2023
018ac9e
[ALP] Including cmath
Oct 24, 2023
5ee73e4
[ALP] Including cmath
Oct 24, 2023
3b11610
[ALP] Small format fix
Oct 24, 2023
0d13662
[ALP] Fix bug in ALP and Windows format fix (hopefully)
Nov 6, 2023
6af5f61
Merge branch 'duckdb:main' into alp_compression
Nov 7, 2023
6caf2b1
[ALP] Fix bug in ALP RD size estimation
Nov 7, 2023
43f33a8
Merge branch 'duckdb:main' into alp_compression
Nov 7, 2023
84e1e80
[ALP] Commit to run Actions with tags
Nov 7, 2023
2ab5166
Merge branch 'duckdb:main' into alp_compression
Nov 8, 2023
a090daa
[ALP] Fix a bug that was making compression slower than intended
Nov 8, 2023
cefaf85
Merge branch 'duckdb:main' into alp_compression
Nov 8, 2023
33a0c02
[ALP] Adding ALP Licenses
Nov 9, 2023
ffeed5a
[ALP] Adding Benchmarks for ALP and ALPRD
Nov 10, 2023
0a04e64
[ALP] Refactoring code and implementing most review comments
Nov 24, 2023
a809e63
[ALP] Removing unused code and modularizing nulls replacement function
Nov 24, 2023
b11ea45
[ALP] Making dictionary not always same size to improve compression r…
Nov 28, 2023
86428a7
[ALP] Format fix
Nov 28, 2023
5259cb2
[ALP] Fix bug in early exit mechanism introduced with a refactoring
Nov 28, 2023
6e52821
[ALP] Fix repeated lines bug
Nov 28, 2023
5983a96
[ALP] Removed unused variable
Nov 28, 2023
83b7c68
[ALP] Including cmath
Nov 28, 2023
c265eba
[ALP] Removing FIXME comment
Nov 29, 2023
40fd3f1
[ALP] Implementing a struct for the Encoding Indices instead of a pair
Nov 29, 2023
63d3a5e
[ALP] Compression optimizations & other small changes
Jan 19, 2024
dcadf17
[ALP] Fixing overload to compress without nulls
Jan 19, 2024
5e90a1e
[ALP] Changing comment of constant to be more accurate
Jan 19, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
[ALP] Fix repeated lines bug
  • Loading branch information
lkuffo committed Nov 28, 2023
commit 6e52821939d1ca431ba5be5f0562c916bc1a0c43
Original file line number Diff line number Diff line change
Expand Up @@ -58,9 +58,6 @@ bool AlpRDAnalyze(AnalyzeState &state, Vector &input, idx_t count) {
return true;
}

analyze_state.vectors_count++;
analyze_state.total_values_count += count;

alp::AlpSamplingParameters sampling_params = alp::AlpUtils::GetSamplingParameters(count);

vector<uint16_t> current_vector_null_positions(sampling_params.n_lookup_values, 0);
Expand Down Expand Up @@ -107,7 +104,7 @@ idx_t AlpRDFinalAnalyze(AnalyzeState &state) {
double estimated_compressed_bits = estimated_bits_per_value * analyze_state.rowgroup_sample.size();
double estimed_compressed_bytes = estimated_compressed_bits / 8;

//! Overhead per segment: [Pointer to metadata + right bitwidth + left bitwidth] + Dictionary Size
//! Overhead per segment: [Pointer to metadata + right bitwidth + left bitwidth + n dict elems] + Dictionary Size
double per_segment_overhead = AlpRDConstants::HEADER_SIZE + AlpRDConstants::MAX_DICTIONARY_SIZE_BYTES;

//! Overhead per vector: Pointer to data + Exceptions count
Expand Down