-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Python][Dev] Dynamically generate the Connection wrapper methods #11202
[Python][Dev] Dynamically generate the Connection wrapper methods #11202
Conversation
…om None todefault connection handled by pybind, but tidy will not be happy about passing by value if I don't use it, and its suggestion to make it a will break the implicit conversion rules we set up - so the *only* purpose of this change is to make tidy happy
Great idea to generate this - maybe we can instead of generating the function objects during |
While I agree, my source of truth for generating this is the definitions created by pybind11 for the connection methods. So I decided to take a step back and generate the cpp code for the DuckDBPyConnection methods that are shared, for which I am writing this gigantic json file that contains the information for both the stubs and the {
"name": "create_function",
"function": "RegisterScalarUDF",
"docs": "Create a DuckDB function out of the passing in Python function so it can be used in queries",
"args": [
{
"name": "name",
"type": "str"
},
{
"name": "function",
"type": "Callable"
},
{
"name": "parameters",
"type": "Optional[List[DuckDBPyType]]",
"default": "None"
},
{
"name": "return_type",
"type": "Optional[DuckDBPyType]",
"default": "None"
}
],
"kwargs": [
{
"name": "type",
"type": "Optional[PythonUDFType]",
"default": "PythonUDFType.NATIVE"
},
{
"name": "null_handling",
"type": "Optional[FunctionNullHandling]",
"default": "FunctionNullHandling.DEFAULT"
},
{
"name": "exception_handling",
"type": "Optional[PythonExceptionHandling]",
"default": "PythonExceptionHandling.DEFAULT"
},
{
"name": "side_effects",
"type": "bool",
"default": "False"
}
],
"return": "DuckDBPyConnection"
}, If everything goes according to plan however, I can write a script that generates all of these things from this json file:
(connection stubs) def create_function(
self,
name: str,
func: Callable,
parameters: Optional[List[DuckDBPyType]] = None,
return_type: Optional[DuckDBPyType] = None,
**kwargs) -> DuckDBPyConnection: ... (wrapper stubs) def create_function(
name: str,
func: Callable,
parameters: Optional[List[DuckDBPyType]] = None,
return_type: Optional[DuckDBPyType] = None,
**kwargs) -> DuckDBPyConnection: ... (wrapper init.py code) def create_function(
name: str,
func: Callable,
parameters: Optional[List[DuckDBPyType]] = None,
return_type: Optional[DuckDBPyType] = None,
**kwargs) -> DuckDBPyConnection: ...
if 'connection' in kwargs:
connection = kwargs.pop('connection')
else:
connection = duckdb.connect(':default:')
return connection.create_function(name, func, parameters, return_type, **kwargs) (cpp definition) m.def(
"create_function",
&DuckDBPyConnection::RegisterScalarUDF,
"Create a DuckDB function out of the passing in Python function so it can be used in queries",
py::arg("name"),
py::arg("function"),
py::arg("parameters") = py::none(),
py::arg("return_type") = py::none(),
py::kw_only(),
py::arg("type") = PythonUDFType::NATIVE,
py::arg("null_handling") = 0,
py::arg("exception_handling") = 0,
py::arg("side_effects") = false); |
… stubs, pybind11 def and init.py
Thanks! LGTM now |
Merge pull request duckdb/duckdb#11673 from hannes/safesign Merge pull request duckdb/duckdb#11688 from carlopi/fixes_duckdb_wasm Merge pull request duckdb/duckdb#11202 from Tishj/python_generated_connection_wrapper
Connection wrapper functions used to have signature `connection: DuckDBPyConnection = ...` However this was (accidentally) changed in duckdb#11202 to `connection: DuckDBPyConnection`, which causes the connection param to be required now - at least for type checking
Connection wrapper functions used to have signature `connection: DuckDBPyConnection = ...` However this was (accidentally) changed in duckdb#11202 to `connection: DuckDBPyConnection`, which causes the connection param to be required now - at least for type checking
This was becoming increasingly hard to maintain, so I have decided to generate the methods instead.
Every method defined on
DuckDBPyConnection
now lives intools/pythonpkg/scripts/connection_methods.json
They look like this:
From this, we generate sections of code inside of:
tools/pythonpkg/duckdb/__init__.py
tools/pythonpkg/duckdb-stubs/__init__.pyi
(DuckDBPyConnection
)tools/pythonpkg/duckdb-stubs/__init__.pyi
(duckdb
)tools/pythonpkg/src/pyconnection.cpp
(InitializeConnectionMethods
)From now on, when adjusting or creating a new connection method, the
connection_methods.json
file should be updated.Old Process
pyconnection.hpp
andpyconnection.cpp
pyconnection.cpp
connection_wrapper.hpp
andconnection_wrapper.cpp
duckdb_python.cpp
duckdb-stubs/__init__.pyi
for both the DuckDBPyConnection method and the corresponding wrapper method.duckdb/__init__.py
(expose the symbol fromduckdb.duckdb
)New Process
pyconnection.hpp
andpyconnection.cpp
connection_methods.json
make generate-files
Some issues encountered:
values
didn't take the optional connection as a keyword argument, now it doesarrow
anddf
are overloaded twice, once forfrom_arrow/df
and another forto_arrow/df
.We now define these in C++ because it's easier (pybind11 handles the overloading on the Python side).