-
Notifications
You must be signed in to change notification settings - Fork 221
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement hash join executor support #494
Conversation
plan/index now does not require async, it only finds indexes using pre-fetched schema info
* Add JoinExecutor field to ast::Join. key_expr: It used as key of HashMap, evaluation is done when executor builds the map. value_expr: value of HashMap, evaluated in the loop of join execution. where_clause: filtering expr which can reduce the size of the HashMap. * Add join planner module. plan/mock.rs - MockStorage impl which is only for testing planners. plan/schema.rs - Analyze AST and returns all related table schema list. plan/evaluable.rs - Check whether the AST (query) is evaluable of not. plan/join.rs - Join Planner, currently the only role is to find HASH JOIN.
Codecov Report
@@ Coverage Diff @@
## main #494 +/- ##
==========================================
+ Coverage 91.63% 92.23% +0.60%
==========================================
Files 160 173 +13
Lines 9393 10745 +1352
==========================================
+ Hits 8607 9911 +1304
- Misses 786 834 +48
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I leave a review (nit-picking) that has been merged.
Timestamp(NaiveDateTime), | ||
Time(NaiveTime), | ||
Interval(Interval), | ||
Uuid(u128), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about using the Uuid type?
Is there a reason you specified u128?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is simple. Value
enum also contains Uuid
with u128
type.
UUID
can be described as [u8; 16]
which is identical to u128
.
.map(move |(i, join_clause)| { | ||
let join_columns = Rc::clone(&self.join_columns[i]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Clear if present under Confirm later)
You might need to check the lengths of join_columns and join_clauses.
self, | ||
rows: impl Stream<Item = Result<BlendContext<'a>>> + 'a, | ||
) -> Result<Joined<'a>> { | ||
let init_rows: Joined<'a> = Box::pin(rows.map(|row| row.map(Rc::new))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let init_rows: Joined<'a> = Box::pin(rows.map(|row| row.map(Rc::new))); | |
let init_rows: Joined = Box::pin(rows.map(|row| row.map(Rc::new))); |
When applying a type alias with a lifetime as the default, you do not need to specify the lifetime.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't know that lifetime in here can be omitted, thanks!
let joins = self | ||
.join_clauses | ||
.iter() | ||
.enumerate() | ||
.map(move |(i, join_clause)| { | ||
let join_columns = Rc::clone(&self.join_columns[i]); | ||
|
||
Ok::<_, Error>((join_clause, join_columns)) | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about code like this:
let joins = self | |
.join_clauses | |
.iter() | |
.enumerate() | |
.map(move |(i, join_clause)| { | |
let join_columns = Rc::clone(&self.join_columns[i]); | |
Ok::<_, Error>((join_clause, join_columns)) | |
}); | |
let joins = self.join_clauses.iter().zip(self.join_columns.iter()).map( | |
|(join_clasue, join_columns)| { | |
let join_columns = Rc::clone(join_columns); | |
Ok::<_, Error>((join_clasue, join_columns)) | |
}, | |
); | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I wasn't aware of this. zip
looks much better.
Thanks a lot. I already merged this so I'll make a new branch to apply this, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I cleaned a bit more on it.
..
let joins = self
.join_clauses
.iter()
.zip(self.join_columns.iter().map(Rc::clone));
stream::iter(joins)
.map(Ok)
.try_fold(init_rows, |rows, (join_clause, join_columns)| {
..
PlanExpr::TwoExprs(expr, expr2) => { | ||
check_expr(context.as_ref().map(Rc::clone), expr) && check_expr(context, expr2) | ||
} | ||
PlanExpr::ThreeExprs(expr, expr2, expr3) => { | ||
check_expr(context.as_ref().map(Rc::clone), expr) | ||
&& check_expr(context.as_ref().map(Rc::clone), expr2) | ||
&& check_expr(context, expr3) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any reason not to integrate with MultiExprs
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MultiExprs
always uses Vec
.
This enums are for avoiding unnecessary heap uses, to use less boxes.
Implement hash join executor support which can boost execution speed of some join queries with
key = value
shape constraints.Hash join executor loads all data to the memory by building hash map based on evaluating pre-planned key expr.
It can dramatically reduce the overhead of accessing storage and join constraint evaluation time.
Add JoinExecutor field to ast::Join.
key_expr
: It used as key of HashMap, evaluation is done when executor builds the map.value_expr
: value of HashMap, evaluated in the loop of join execution.where_clause
: filtering expr which can reduce the size of the HashMap.Add join planner module.
plan/mock.rs
- MockStorage impl which is only for testing planners.plan/schema.rs
- Analyze AST and returns all related table schema list.plan/evaluable.rs
- Check whether the AST (query) is evaluable of not.plan/join.rs
- Join Planner, currently the only role is to find HASH JOIN.