[Python][Arrow] Don't deduplicate column names when outputting to Arrow #11160

Tishj · 2024-03-14T16:18:59Z

This PR fixes #11157

This problem was likely introduced in #10532

When outputting to PyArrow, we create batches and then create a table from those batches.
When creating the batches we were deduplicating column names, but didn't do this for the schema passed to the table-from-batches function.

We properly skip deduplicating in both places now, Arrow has no problem with duplicate column names.

… in one place, but doing it in another, causing a schema mismatch in the produced arrow table

Mytherin · 2024-03-14T18:38:46Z

Thanks!

Merge pull request duckdb/duckdb#11161 from Maxxen/bugfixes Merge pull request duckdb/duckdb#11160 from Tishj/dont_deduplicate_arrow_columns

slight oversight in deduplication PR - we were skipping deduplication…

d6d8c3b

… in one place, but doing it in another, causing a schema mismatch in the produced arrow table

Mytherin merged commit 6218e71 into duckdb:main Mar 14, 2024
14 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Python][Arrow] Don't deduplicate column names when outputting to Arrow #11160

[Python][Arrow] Don't deduplicate column names when outputting to Arrow #11160

Tishj commented Mar 14, 2024

Mytherin commented Mar 14, 2024

[Python][Arrow] Don't deduplicate column names when outputting to Arrow #11160

[Python][Arrow] Don't deduplicate column names when outputting to Arrow #11160

Conversation

Tishj commented Mar 14, 2024

Mytherin commented Mar 14, 2024