Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reorder semi and anti joins. #11815

Merged
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
118 commits
Select commit Hold shift + click to select a range
18b7e51
I think we can reorder semi joins
Tmonster Jan 25, 2024
77d503e
I have access to the left and right sets and the join type. Now I sho…
Tmonster Jan 25, 2024
34ad120
can now estimate a semi join to be 20% of the left table
Tmonster Jan 25, 2024
d8f8f9f
have most everything working. some CEs not quite. Q05 tpch needs to b…
Tmonster Jan 26, 2024
8e6b730
Merge remote-tracking branch 'upstream/main' into reorder-semi-and-an…
Tmonster Jan 26, 2024
4ef7415
some debugging statements
Tmonster Jan 29, 2024
d0e4892
make format-fix
Tmonster Jan 29, 2024
9eecc85
fix code where numerator relations were not properly being merged. Al…
Tmonster Jan 29, 2024
2732612
add cross product join type
Tmonster Jan 29, 2024
bd1d164
fix join is reorderabe function. generate enums
Tmonster Jan 29, 2024
8be872d
pausing point
Tmonster Jan 29, 2024
b9885f6
fix last join issue and better min/max stats help on distinct count
Tmonster Jan 30, 2024
e7eaf10
remove min max stats changesg
Tmonster Jan 30, 2024
2183142
add back in removed test
Tmonster Jan 30, 2024
7f49a57
pausing to work on adding benchmarks for ingestion
Tmonster Jan 31, 2024
206109e
trying to fix this column lifetime analyzer bug
Tmonster Jan 31, 2024
830cdec
you can also change the bindings filters return. but that's not a goo…
Tmonster Feb 1, 2024
dfab1b6
remove changing what bindings filters return
Tmonster Feb 1, 2024
72cd1be
check tests one more time. still need to figure out logical as of joi…
Tmonster Feb 1, 2024
990b296
Merge branch 'reorder-semi-and-anti-joins' of github.com:Tmonster/duc…
Tmonster Feb 2, 2024
333f252
make format-fix
Tmonster Feb 2, 2024
7d7690c
might be a better solution, lets see if debug passes
Tmonster Feb 12, 2024
4414834
Merge remote-tracking branch 'upstream/main' into reorder_semi_anti_j…
Tmonster Feb 13, 2024
b22940f
remove debugging statements
Tmonster Feb 13, 2024
8f8edbd
fix logic for cross product relations
Tmonster Feb 13, 2024
d651dcf
skip test that I will remove eventually
Tmonster Feb 13, 2024
1fdb318
these ideas dont work tbh
Tmonster Feb 13, 2024
8fc0048
so close, but cant figure it out yet
Tmonster Feb 13, 2024
9827732
still need to track down this bug
Tmonster Feb 13, 2024
3ee5e65
follow this path. Somewhere in the test visit_operator_expressions_in…
Tmonster Feb 14, 2024
49548c1
I think we can reorder semi joins
Tmonster Jan 25, 2024
4ad73e1
I have access to the left and right sets and the join type. Now I sho…
Tmonster Jan 25, 2024
f665ea7
can now estimate a semi join to be 20% of the left table
Tmonster Jan 25, 2024
6c39fb1
have most everything working. some CEs not quite. Q05 tpch needs to b…
Tmonster Jan 26, 2024
c45bc52
some debugging statements
Tmonster Jan 29, 2024
c9bbc47
make format-fix
Tmonster Jan 29, 2024
7f1ab5e
fix code where numerator relations were not properly being merged. Al…
Tmonster Jan 29, 2024
57a21f2
add cross product join type
Tmonster Jan 29, 2024
30983ec
fix join is reorderabe function. generate enums
Tmonster Jan 29, 2024
818bceb
pausing point
Tmonster Jan 29, 2024
8b250a7
fix last join issue and better min/max stats help on distinct count
Tmonster Jan 30, 2024
168e1d0
remove min max stats changesg
Tmonster Jan 30, 2024
bf24aed
add back in removed test
Tmonster Jan 30, 2024
7802160
pausing to work on adding benchmarks for ingestion
Tmonster Jan 31, 2024
be6f854
trying to fix this column lifetime analyzer bug
Tmonster Jan 31, 2024
54d13eb
you can also change the bindings filters return. but that's not a goo…
Tmonster Feb 1, 2024
23a1854
remove changing what bindings filters return
Tmonster Feb 1, 2024
d57c740
check tests one more time. still need to figure out logical as of joi…
Tmonster Feb 1, 2024
3108292
make format-fix
Tmonster Feb 2, 2024
5afbd5b
might be a better solution, lets see if debug passes
Tmonster Feb 12, 2024
75f6d9d
remove debugging statements
Tmonster Feb 13, 2024
1d0b442
fix logic for cross product relations
Tmonster Feb 13, 2024
8012a31
skip test that I will remove eventually
Tmonster Feb 13, 2024
c8d88e0
these ideas dont work tbh
Tmonster Feb 13, 2024
21817fe
so close, but cant figure it out yet
Tmonster Feb 13, 2024
fbd8162
still need to track down this bug
Tmonster Feb 13, 2024
996fd64
follow this path. Somewhere in the test visit_operator_expressions_in…
Tmonster Feb 14, 2024
14dcd16
skip failing test
Tmonster Mar 28, 2024
d9441ad
Merge branch 'feature' into reorder_semi_anti_joins_easier_fix
Tmonster Apr 4, 2024
922fb49
no longer fails that test
Tmonster Apr 4, 2024
ee258b3
disable other test, see what else fails
Tmonster Apr 4, 2024
9910429
make format-fix
Tmonster Apr 4, 2024
a08db68
disable other failing test, see what else fails
Tmonster Apr 4, 2024
46d0114
basically everything works now. Just need to fix the cardinality esti…
Tmonster Apr 4, 2024
a26d06b
fix join order optimizer seg fault failure
Tmonster Apr 5, 2024
a3adc84
remove/clean up tests
Tmonster Apr 5, 2024
5076c09
clean up some unused code
Tmonster Apr 5, 2024
9ab4ca4
Merge branch 'reorder_semi_anti_joins_easier_fix' of github.com:Tmons…
Tmonster Apr 5, 2024
5a00256
remove randomly failing test
Tmonster Apr 5, 2024
faa0755
more code clean up
Tmonster Apr 5, 2024
71da791
make format-fix
Tmonster Apr 5, 2024
e4352af
comment out ensuring statististcs because arrow has not statistics
Tmonster Apr 5, 2024
1cb53ad
make format-fix (again)
Tmonster Apr 5, 2024
4d7b82e
rewrite mark join needs to be in more places
Tmonster Apr 8, 2024
e4d78f4
Merge remote-tracking branch 'upstream/feature' into reorder_semi_ant…
Tmonster Apr 8, 2024
9e538dd
distinct count should be max of 1 and the reported distinct count
Tmonster Apr 8, 2024
d9df4df
removed unused variable
Tmonster Apr 8, 2024
e01ab55
clean up test fileg
Tmonster Apr 8, 2024
790bd63
remove hashtag
Tmonster Apr 8, 2024
b6bb52a
clang tidy fixes #1
Tmonster Apr 9, 2024
c5a2a42
make format-fixes 2
Tmonster Apr 9, 2024
c9f56d1
pr comments 1
Tmonster Apr 9, 2024
7dc4bcb
add more comments for readability
Tmonster Apr 10, 2024
b998e86
it points to unused memory is because we remove entries from the subg…
Tmonster Apr 10, 2024
004f308
refactoring is a lot easier to think about if it is just a spanning tree
Tmonster Apr 12, 2024
6c8644f
lots of refactor done, need to understand where query graph edges com…
Tmonster Apr 22, 2024
d53a5f7
Merge branch 'feature' into reorder_semi_anti_joins_easier_fix_refactor
Tmonster Apr 22, 2024
3093b9a
got it all to compile and work finally
Tmonster Apr 22, 2024
3be1487
fixes with join order optimizer
Tmonster Apr 23, 2024
5bac735
why is q09 regressing?
Tmonster Apr 23, 2024
50ed8cb
refactor should be done now
Tmonster Apr 23, 2024
f30f26c
more small fixes
Tmonster Apr 23, 2024
28b9e76
remove break point
Tmonster Apr 23, 2024
02fbf90
revert script changes, actually remove unneeded break point
Tmonster Apr 23, 2024
01dc878
remove Print tdom info and other unused code
Tmonster Apr 23, 2024
a347cf7
Merge branch 'feature' into reorder_semi_anti_joins_easier_fix_refactor
Tmonster Apr 24, 2024
7553dd4
fix regression and refactor function name
Tmonster Apr 25, 2024
601ea05
remove break_here
Tmonster Apr 25, 2024
75bcac6
fixed q95 again.
Tmonster Apr 25, 2024
0d3c357
Merge remote-tracking branch 'upstream/feature' into reorder_semi_ant…
Tmonster Apr 30, 2024
6bc5e0e
remove unused filter strength variable. add constant semi_anti filter…
Tmonster May 1, 2024
eab427f
Merge branch 'feature' into reorder_semi_anti_joins_easier_fix_refactor
Tmonster May 3, 2024
a8ad4d0
will this pass CI?
Tmonster May 3, 2024
13e58e9
make format-fix
Tmonster May 3, 2024
fe6fdef
remove unused variable
Tmonster May 3, 2024
3fe233f
Merge remote-tracking branch 'upstream/feature' into reorder_semi_ant…
Tmonster May 3, 2024
db16bac
make format-fix
Tmonster May 3, 2024
4c5ef7d
Merge branch 'feature' into reorder_semi_anti_joins_easier_fix_refactor
Tmonster May 15, 2024
b3d7955
add explicit constructor
Tmonster May 15, 2024
9a02adb
make format-fix
Tmonster May 15, 2024
82043f3
change default semi anti selectivity to 5
Tmonster May 17, 2024
a7b60eb
add test to make sure push down is happening
Tmonster May 17, 2024
72ec97a
make format-fix
Tmonster May 21, 2024
3f039bc
CI should run again please
Tmonster May 30, 2024
f3ffde1
add patch for substrait for semi and anti joins
Tmonster Jun 4, 2024
fe56a6b
Merge remote-tracking branch 'upstream/feature' into reorder_semi_ant…
Tmonster Jun 4, 2024
8fa3935
actually apply patches
Tmonster Jun 4, 2024
336dafa
fix patch fileg
Tmonster Jun 4, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
trying to fix this column lifetime analyzer bug
  • Loading branch information
Tmonster committed Jan 31, 2024
commit 206109e0d9d8b8bfaafba9ea6b55e9619ce52640
1 change: 1 addition & 0 deletions src/optimizer/column_lifetime_analyzer.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -90,6 +90,7 @@ void ColumnLifetimeAnalyzer::VisitOperator(LogicalOperator &op) {

// then generate the projection map
GenerateProjectionMap(op.children[1]->GetColumnBindings(), unused_bindings, comp_join.right_projection_map);
if (op.Get)
return;
}
case LogicalOperatorType::LOGICAL_UNION:
Expand Down
3 changes: 0 additions & 3 deletions src/optimizer/join_order/cardinality_estimator.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -227,9 +227,6 @@ DenomInfo CardinalityEstimator::GetDenominator(JoinRelationSet &set) {
if (filter->join_type == JoinType::INNER) {
// iterate through other subgraphs and merge.
bool found_match = FindSubgraphMatchAndMerge(*it, find_table, next_subgraph, subgraphs.end());
if (!found_match) {
auto call_me = "sdfvs";
}
// Now insert the right binding and update denominator with the
// tdom of the filter
// insert find_table again in case there was no other subgraph.
Expand Down
3 changes: 3 additions & 0 deletions src/optimizer/join_order/query_graph_manager.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -151,6 +151,9 @@ GenerateJoinRelation QueryGraphManager::GenerateJoins(vector<unique_ptr<LogicalO
}
}

if (chosen_filter->join_type == JoinType::SEMI || chosen_filter->join_type == JoinType::ANTI || chosen_filter->join_type == JoinType::MARK) {
auto break_here = "dfs";
}
auto join = make_uniq<LogicalComparisonJoin>(chosen_filter->join_type);
// Here we optimize build side probe side. Our build side is the right side
// So the right plans should have lower cardinalities.
Expand Down
23 changes: 23 additions & 0 deletions test/optimizer/joins/random_previously_failing_join_queries.test
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
# name: test/optimizer/joins/random_previously_failing_join_queries.test
# description: see description
# group: [joins]

statement ok
ATTACH 'test4.db'

statement ok
USE test4

query TIIIIIII rowsort join277
SELECT x5, e9+108, a3+149+a5, e4+358
FROM t3, t4, t5, t6, t9, t1, t8
WHERE a9 in (28,11,739,102,413,389)
AND a3 in (515,190,306,513,959,788,198)
AND e8=c9
AND b4=d6
AND a1=d8
limit 10
----
168 values hashing to 34325f84dd0efa600c0be4e8e0770bc3


5 changes: 4 additions & 1 deletion test/sqlite/select4.test_slow
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
# name: test/sqlite/select4.test_slow
# group: [sqlite]

load test4.db

statement ok
CREATE TABLE t1(
a1 INTEGER,
Expand Down Expand Up @@ -39745,7 +39747,7 @@ SELECT e1, x8, d9, c6, x2
----
1050 values hashing to e0a50b87e246602e118b07307e5eaa36

mode unskip
#mode unskip

query TIIIIIII rowsort join277
SELECT x5, e6+c6, d1, c8, e9+108, a7, a3+149+a5, e4+358
Expand Down Expand Up @@ -48306,3 +48308,4 @@ SELECT c6*856, x5
----
28 values hashing to 8ceb8232c1e6c7b7f5d87ea5c54e08d0

mode unskip