Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Julia] Improves Julia support for scalar UDFs #15430

Merged
merged 9 commits into from
Jan 3, 2025

Conversation

tqml
Copy link
Contributor

@tqml tqml commented Dec 20, 2024

This PR improves the Julia support for user defined UDFs.

It adds the ScalarFunction type and the @create_scalar_function macro that automatically generates a DuckDB compatible wrapper. See also discussion #13176 for details.

Numeric types, dates, strings, missing values and exceptions are supported. Composite types are not yet implemented.

Example Usage:

using DuckDB, DataFrames

f_add = (a, b) -> a + b
db = DuckDB.DB()
con = DuckDB.connect(db)

# Create the scalar function 
# the second argument can be omitted if the function name is identical to a global symbol
fun = DuckDB.@create_scalar_function f_add(a::Int, b::Int)::Int f_add
DuckDB.register_scalar_function(con, fun)

df = DataFrame(a = [1, 2, 3], b = [1, 2, 3])
DuckDB.register_table(con, df, "test1")

result = DuckDB.execute(con, "SELECT f_add(a, b) as result FROM test1") |> DataFrame

Performance

The performance of the auto-generated wrapper is comparable to pure DuckDB/Julia. I measured the elapsed time (in seconds) of adding 10 million numbers in a coarse benchmark:

Int Float
DataFrames.jl 0.092947083 0.090409625
DuckDB 0.065306042 0.054156167
UDF 0.078665125 0.080781

Internals

The scalar functions are tracked in a dictionary in the DuckDBHandle struct. Currently only registering scalar function is supported.

The wrapper is generated via the macro and should be fully type stable. The wrapper is generated via the function _udf_generate_wrapper().

Because of limitations of the @cfunction macro in Julia, the wrapper needs to be globally accessible. I implemented this by introducing a global (constant) dictionary variable in DuckDB _UDF_WRAPPER_CACHE which is used to store the generated wrappers. This is defined in _udf_register_wrapper(). This is, in my opinion not ideal but works. I asked in Julia related forums for a better solution and will update the code, if something better is possible.

This is an update to PR #14024

@duckdb-draftbot duckdb-draftbot marked this pull request as draft December 20, 2024 14:23
@tqml tqml marked this pull request as ready for review December 27, 2024 09:11
@tqml tqml changed the title Improves Julia support for scalar UDFs [Julia] Improves Julia support for scalar UDFs Dec 27, 2024
@hannes hannes requested a review from Mytherin December 30, 2024 09:00
@Mytherin Mytherin merged commit 1342bdb into duckdb:main Jan 3, 2025
8 checks passed
@Mytherin
Copy link
Collaborator

Mytherin commented Jan 3, 2025

Thanks for the PR! Looks great!

github-actions bot pushed a commit to duckdb/duckdb-r that referenced this pull request Jan 4, 2025
Don't create config folder on extension listing (duckdb/duckdb#15530)
[Python] Align the behavior between `sql` and `execute` for `.pl()` call (duckdb/duckdb#15537)
Update year in license file to 2025 (duckdb/duckdb#15545)
[Julia] Improves Julia support for scalar UDFs (duckdb/duckdb#15430)
github-actions bot added a commit to duckdb/duckdb-r that referenced this pull request Jan 4, 2025
Don't create config folder on extension listing (duckdb/duckdb#15530)
[Python] Align the behavior between `sql` and `execute` for `.pl()` call (duckdb/duckdb#15537)
Update year in license file to 2025 (duckdb/duckdb#15545)
[Julia] Improves Julia support for scalar UDFs (duckdb/duckdb#15430)

Co-authored-by: krlmlr <krlmlr@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants