Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Substrait Installable Extension #3034

Merged
merged 27 commits into from
Feb 16, 2022
Merged

Substrait Installable Extension #3034

merged 27 commits into from
Feb 16, 2022

Conversation

pdet
Copy link
Contributor

@pdet pdet commented Feb 4, 2022

This PR introduces the DuckDB-Substrait Installable Extension.

It also inlines substrait (only the generated CPP code from the proto files) and protobuf.

At this point, this extension is still highly experimental, with all tests being round-trips to and from substrait plans.

To build duckdb with the substrait extension, you need to pass the BUILD_SUBSTRAIT_EXTENSION flag on:
e.g.,

BUILD_SUBSTRAIT_EXTENSION=1  make debug

To use it, two new functions are introduced:

  1. get_substrait()
    This function requires a valid SQL query and returns a BLOB column with the serialized query plan.
    e.g.,
CREATE TABLE crossfit (exercise text,dificulty_level int);

insert into crossfit values ('Push Ups', 3), ('Pull Ups', 5) , (' Push Jerk', 7), ('Bar Muscle Up', 10);

CALL get_substrait('select count(exercise) from crossfit where dificulty_level <=5')
----
\x12\x11\x1A\x0F\x1A\x0Dlessthanequal\x12\x11\x1A\x0F\x10\x01\x1A\x0Bis_not_null\x12\x09\x1A\x07\x10\x02\x1A\x03and\x12\x10\x1A\x0E\x10\x03\x1A\x0Acount_star\x1A\x5C\x0AZ:X\x12N"L\x12B\x0A@\x1A(\x1A&\x08\x02\x12\x12\x1A\x10\x12\x08\x12\x06\x0A\x04\x12\x02\x08\x01\x12\x04\x0A\x02(\x05\x12\x0E\x1A\x0C\x08\x01\x12\x08\x12\x06\x0A\x04\x12\x02\x08\x01"\x08\x0A\x06\x0A\x02\x08\x01\x0A\x00:\x0A\x0A\x08crossfit\x1A\x00"\x04\x0A\x02\x08\x03\x1A\x06\x12\x04\x0A\x02\x12\x00
  1. from_substrait()
    This function requires a serialized plan as a BLOB and returns the query result
CALL from_substrait('\x12\x11\x1A\x0F\x1A\x0Dlessthanequal\x12\x11\x1A\x0F\x10\x01\x1A\x0Bis_not_null\x12\x09\x1A\x07\x10\x02\x1A\x03and\x12\x10\x1A\x0E\x10\x03\x1A\x0Acount_star\x1A\x5C\x0AZ:X\x12N"L\x12B\x0A@\x1A(\x1A&\x08\x02\x12\x12\x1A\x10\x12\x08\x12\x06\x0A\x04\x12\x02\x08\x01\x12\x04\x0A\x02(\x05\x12\x0E\x1A\x0C\x08\x01\x12\x08\x12\x06\x0A\x04\x12\x02\x08\x01"\x08\x0A\x06\x0A\x02\x08\x01\x0A\x00:\x0A\x0A\x08crossfit\x1A\x00"\x04\x0A\x02\x08\x03\x1A\x06\x12\x04\x0A\x02\x12\x00'::BLOB)
----
2

If the pragma enable_verification is executed, we also run an extra query verification step on the get_substrait() function that roundtrips the query internally and verifies if the results are correct.

cc @cpcloud

@pdet pdet requested a review from Mytherin February 4, 2022 12:37
@pdet pdet added the feature label Feb 4, 2022
Copy link
Collaborator

@Mytherin Mytherin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! Looks great. Some comments:

.github/workflows/Main.yml Outdated Show resolved Hide resolved
.github/workflows/Main.yml Outdated Show resolved Hide resolved
to_substrait.cpp from_substrait.cpp substrait-extension.cpp
${SUBSTRAIT_SOURCES} ${PROTOBUF_SOURCES})

build_loadable_extension(substrait substrait-extension.cpp)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is missing many sources, it will fail when actually used as a loadable extension (although this is not tested right now).

@lava
Copy link

lava commented Feb 10, 2022

I tried compiling this to check out the created query plans for another project. It seems to work as described, but I'm not sure how to actually get the generated query plan and store it as a protobuf? The duckdb terminal only shows an abbreviated version of the blob:

D CREATE TABLE http (port int);
D CALL get_substrait('SELECT * from http');
┌────────────────────────────────────────────────────────────────────────────────────┐
│                                     Plan Blob                                      │
├────────────────────────────────────────────────────────────────────────────────────┤
│ \x1A\x1E\x0A\x1C:\x1A\x12\x10\x0A\x0E"\x04\x0A\x02\x0A\x00:\x06\x0A\x04http\x1A... │
└────────────────────────────────────────────────────────────────────────────────────┘

@pdet
Copy link
Contributor Author

pdet commented Feb 10, 2022

Maybe using it through the shell is not the most practical way, a python script through the python API should be easier.

@pdet
Copy link
Contributor Author

pdet commented Feb 10, 2022

@Mytherin I'm only missing the windows-extensions now, but I'm a bit lost on the fixes necessary for it.

Any idea what I'm missing or how to gather more debugging info?

@Mytherin
Copy link
Collaborator

I tried compiling this to check out the created query plans for another project. It seems to work as described, but I'm not sure how to actually get the generated query plan and store it as a protobuf? The duckdb terminal only shows an abbreviated version of the blob:

D CREATE TABLE http (port int);
D CALL get_substrait('SELECT * from http');
┌────────────────────────────────────────────────────────────────────────────────────┐
│                                     Plan Blob                                      │
├────────────────────────────────────────────────────────────────────────────────────┤
│ \x1A\x1E\x0A\x1C:\x1A\x12\x10\x0A\x0E"\x04\x0A\x02\x0A\x00:\x06\x0A\x04http\x1A... │
└────────────────────────────────────────────────────────────────────────────────────┘

The truncation only happens in the box mode rendering. You can change the rendering using the .mode specifier (e.g. .mode csv or .mode json).

@Mytherin
Copy link
Collaborator

@pdet

The remaining windows error is related to missing DUCKDB_API specifiers:

2022-02-14T14:19:39.0787294Z from_substrait.obj : error LNK2019: unresolved external symbol "public: __cdecl duckdb::FunctionExpression::FunctionExpression(class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> > const &,class std::vector<class std::unique_ptr<class duckdb::ParsedExpression,struct std::default_delete<class duckdb::ParsedExpression> >,class std::allocator<class std::unique_ptr<class duckdb::ParsedExpression,struct std::default_delete<class duckdb::ParsedExpression> > > >,class std::unique_ptr<class duckdb::ParsedExpression,struct std::default_delete<class duckdb::ParsedExpression> >,class std::unique_ptr<class duckdb::OrderModifier,struct std::default_delete<class duckdb::OrderModifier> >,bool,bool,bool)" (??0FunctionExpression@duckdb@@QEAA@AEBV?$basic_string@DU?$char_traits@D@std@@V?$allocator@D@2@@std@@V?$vector@V?$unique_ptr@VParsedExpression@duckdb@@U?$default_delete@VParsedExpression@duckdb@@@std@@@std@@V?$allocator@V?$unique_ptr@VParsedExpression@duckdb@@U?$default_delete@VParsedExpression@duckdb@@@std@@@std@@@2@@3@V?$unique_ptr@VParsedExpression@duckdb@@U?$default_delete@VParsedExpression@duckdb@@@std@@@3@V?$unique_ptr@VOrderModifier@duckdb@@U?$default_delete@VOrderModifier@duckdb@@@std@@@3@_N44@Z) referenced in function "class std::unique_ptr<class duckdb::FunctionExpression,struct std::default_delete<class duckdb::FunctionExpression> > __cdecl std::make_unique<class duckdb::FunctionExpression,class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> > &,class std::vector<class std::unique_ptr<class duckdb::ParsedExpression,struct std::default_delete<class duckdb::ParsedExpression> >,class std::allocator<class std::unique_ptr<class duckdb::ParsedExpression,struct std::default_delete<class duckdb::ParsedExpression> > > >,0>(class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> > &,class std::vector<class std::unique_ptr<class duckdb::ParsedExpression,struct std::default_delete<class duckdb::ParsedExpression> >,class std::allocator<class std::unique_ptr<class duckdb::ParsedExpression,struct std::default_delete<class duckdb::ParsedExpression> > > > &&)" (??$make_unique@VFunctionExpression@duckdb@@AEAV?$basic_string@DU?$char_traits@D@std@@V?$allocator@D@2@@std@@V?$vector@V?$unique_ptr@VParsedExpression@duckdb@@U?$default_delete@VParsedExpression@duckdb@@@std@@@std@@V?$allocator@V?$unique_ptr@VParsedExpression@duckdb@@U?$default_delete@VParsedExpression@duckdb@@@std@@@std@@@2@@4@$0A@@std@@YA?AV?$unique_ptr@VFunctionExpression@duckdb@@U?$default_delete@VFunctionExpression@duckdb@@@std@@@0@AEAV?$basic_string@DU?$char_traits@D@std@@V?$allocator@D@2@@0@$$QEAV?$vector@V?$unique_ptr@VParsedExpression@duckdb@@U?$default_delete@VParsedExpression@duckdb@@@std@@@std@@V?$allocator@V?$unique_ptr@VParsedExpression@duckdb@@U?$default_delete@VParsedExpression@duckdb@@@std@@@std@@@2@@0@@Z) [D:\a\duckdb\duckdb\build\release\extension\substrait\substrait_loadable_extension.vcxproj]

@Mytherin Mytherin merged commit 0b9c1c2 into duckdb:master Feb 16, 2022
@pdet pdet deleted the substrait branch June 27, 2024 13:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants