-
Notifications
You must be signed in to change notification settings - Fork 751
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft RFC for support nullable_number_type. #4158
Draft RFC for support nullable_number_type. #4158
Conversation
Thanks for the contribution! Please review the labels and make any necessary changes. |
This pull request is being automatically deployed with Vercel (learn more). 🔍 Inspect: https://vercel.com/databend/databend/FvAHo2NUBKXEvqtaEnFAZjpFfEcR [Deployment for 581e339 canceled] |
@sundy-li I have wrote part of the process codes, is this the idea right? |
Signed-off-by: wang.zhen <wangzhenaaa7@gmail.com>
Signed-off-by: wang.zhen <wangzhenaaa7@gmail.com>
Signed-off-by: wang.zhen <wangzhenaaa7@gmail.com>
/// Type promote to the nearest higher type. | ||
/// UInt8 -> UInt16 -> UInt32 -> UInt64 -> bitmap<128> | ||
/// Int8 -> Int16 -> Int32 -> Int64 -> bitmap<128> | ||
pub(crate) fn _type_promotion(column: &ColumnRef, _datatype: TypeID) -> Option<TypeID> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already have construct_numeric_type
&& numeric_byte_size
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already have
construct_numeric_type
&&numeric_byte_size
.
oops, duplicated ..
But, I have a question. IMHO,
numeric_byte_size is some method bound with concrete type, But this method returns a dyn trait. The number_byte_size method will be used via a type downcast, which I think is not a good solution. what?do you think?
@sundy-li
Thanks for the contribution! Please review the labels and make any necessary changes. |
Signed-off-by: wang.zhen <wangzhenaaa7@gmail.com>
Signed-off-by: wang.zhen <wangzhenaaa7@gmail.com>
Signed-off-by: wang.zhen <wangzhenaaa7@gmail.com>
Signed-off-by: wang.zhen <wangzhenaaa7@gmail.com>
Signed-off-by: wang.zhen <wangzhenaaa7@gmail.com>
Signed-off-by: wang.zhen <wangzhenaaa7@gmail.com>
Signed-off-by: wang.zhen <wangzhenaaa7@gmail.com>
Signed-off-by: wang.zhen <wangzhenaaa7@gmail.com>
Signed-off-by: wang.zhen <wangzhenaaa7@gmail.com>
Signed-off-by: wang.zhen <wangzhenaaa7@gmail.com>
Codecov Report
@@ Coverage Diff @@
## main #4158 +/- ##
======================================
- Coverage 60% 60% -1%
======================================
Files 804 804
Lines 40918 41062 +144
======================================
+ Hits 24887 24908 +21
- Misses 16031 16154 +123
Continue to review full report at Codecov.
|
} else { | ||
column.data_type() | ||
}; | ||
if _typ.data_type_id().is_integer() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better to rename _typ
|
||
/// Init the nullable part's offset in bytes. It follows the values part. | ||
#[inline] | ||
fn init_nullable_offset_via_fields( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not make this function return Result<usize>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not make this function return
Result<usize>
I see some codes use this kind of initialization as c style. Using Result makes codes more healthy, it is a better way.
@@ -36,6 +37,17 @@ pub trait TypeDeserializer: Send + Sync { | |||
fn de(&mut self, reader: &mut &[u8]) -> Result<()>; | |||
fn de_default(&mut self); | |||
fn de_batch(&mut self, reader: &[u8], step: usize, rows: usize) -> Result<()>; | |||
fn de_batch_with_nullable( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's only for nullable, so it is not must to put this into trait.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's only for nullable, so it should not to put this into trait.
Yes, your idea is valuable.
// todo, We should get the length of missed bytes which caused | ||
// by self.inner.de method. | ||
if reader[null_offset - 1] & 1 == 1 { | ||
self.bitmap.push(false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self.bitmap.push(reader[null_offset - 1] & 1 != 1)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self.bitmap.push(reader[null_offset - 1] & 1 != 1)
That's cool.
Signed-off-by: wang.zhen <wangzhenaaa7@gmail.com>
@@ -31,9 +32,17 @@ impl DataBlock { | |||
let mut group_key_len = 0; | |||
for col in column_names { | |||
let column = block.try_column_by_name(col)?; | |||
let typ = column.data_type(); | |||
let typ = if column.is_nullable() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use remove_nullable
directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use
remove_nullable
directly.
remove_nullable is c~.
remove_nullable(&column.data_type()) | ||
} else { | ||
column.data_type() | ||
}; | ||
if typ.data_type_id().is_integer() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can support is_numeric
.
} | ||
|
||
#[inline] | ||
fn check_group_columns_has_nullable(group_columns: &[&ColumnRef]) -> bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
group_columns.any(|c| c.is_nullable())
|
||
/// Init the nullable part's offset in bytes. It follows the values part. | ||
#[inline] | ||
fn init_nullable_offset(group_keys: &[&ColumnRef]) -> Result<usize> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
group_keys.iter().map(|c| c.data_type().numeric_byte_size().unwrap()).sum()
@@ -305,3 +427,43 @@ fn build( | |||
} | |||
Ok(()) | |||
} | |||
|
|||
#[inline] | |||
fn build_keys_with_nullable_column( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
build_keys_with_nullable_column build merge into one method.
} | ||
Ok(group_keys) | ||
} | ||
|
||
fn group_by_get_indices( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove group_by
&& group_by_get_indices
if size == mem_size { | ||
if col.is_nullable() { | ||
let writer = unsafe { writer.add(*offsize) }; | ||
Series::fixed_hash_with_nullable(col, writer, step, *null_offset)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dowcast to Nullablecolumn and apply this method.
@@ -36,6 +59,27 @@ impl Series { | |||
}) | |||
} | |||
|
|||
pub fn fixed_hash_with_nullable( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove
@@ -138,3 +193,30 @@ impl GroupHash for StringColumn { | |||
Ok(()) | |||
} | |||
} | |||
|
|||
impl GroupHash for NullableColumn { | |||
fn fixed_hash_with_nullable( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove
column: &Arc<dyn Column>, | ||
bitmap: &Bitmap, | ||
step: usize, | ||
null_offset: usize, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Merge into fixed_hash
.
Option<(&Bitmap, usize)>
} | ||
Ok(()) | ||
} | ||
|
||
pub fn fixed_hash(column: &ColumnRef, ptr: *mut u8, step: usize) -> Result<()> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add Option<&Bitmap, usize>
@@ -46,6 +47,23 @@ impl TypeDeserializer for NullableDeserializer { | |||
Ok(()) | |||
} | |||
|
|||
fn de_batch_with_nullable( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Donot put it into trait.
for row in 0..rows { | ||
let mut reader = &reader[step * row..]; | ||
self.inner.de(&mut reader)?; | ||
self.bitmap.push(reader[*null_offset - type_size] & 1 != 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
!=1 ---> = 0
Continue in #4340 |
Signed-off-by: wang.zhen wangzhenaaa7@gmail.com
I hereby agree to the terms of the CLA available at: https://databend.rs/dev/policies/cla/
Summary
Draft RFC for support nullable_number_type.
Changelog
Related Issues
#4079
Fixes #issue
Test Plan
Unit Tests
Stateless Tests