Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Arrow] Properly use the parent's array.offset in many places in the scan #9661

Merged
merged 25 commits into from
Nov 16, 2023

Conversation

Tishj
Copy link
Contributor

@Tishj Tishj commented Nov 13, 2023

This PR fixes #5547

It seems like the arrow arrays hit a certain chunk size limit, and are split up over multiple arrays while sharing the same buffer.
These offsets are put onto the struct array and should be forwarded to the child arrays, we did not do this previously.
(Or at least not consistently)

Copy link
Collaborator

@Mytherin Mytherin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Looks great. Could you just look at the failing CI?

@github-actions github-actions bot marked this pull request as draft November 14, 2023 12:44
@Tishj
Copy link
Contributor Author

Tishj commented Nov 14, 2023

Thanks! Looks great. Could you just look at the failing CI?

Weirdly that's one of the tests I added, seems I didn't run it - and it was broken
I've fixed the test and added tests for nulls

Discovered another bug concerning dictionaries, structs and nulls - will likely fix this in a separate PR
^ Also fixed this

@Mytherin Mytherin marked this pull request as ready for review November 14, 2023 13:02
@github-actions github-actions bot marked this pull request as draft November 14, 2023 15:10
@Tishj Tishj marked this pull request as ready for review November 14, 2023 15:28
@github-actions github-actions bot marked this pull request as draft November 14, 2023 16:41
@Tishj Tishj marked this pull request as ready for review November 15, 2023 08:21
@github-actions github-actions bot marked this pull request as draft November 15, 2023 10:23
@Tishj Tishj marked this pull request as ready for review November 15, 2023 12:46
@Tishj
Copy link
Contributor Author

Tishj commented Nov 15, 2023

Had this weird CI failure on my fork:

          for path in extension_paths_found:
              print(path)
              if path.endswith(extension_name + ".duckdb_extension"):
                  conn = duckdb.connect(db_name, config={'allow_unsigned_extensions': 'true'})
  >               conn.execute(f"LOAD '{path}'")
  E               duckdb.duckdb.InvalidInputException: Invalid Input Error: Extension "/project/httpfs/httpfs.duckdb_extension" version (v..1-dev) does not match DuckDB version (v0.0.0)

Probably caused by this being the version of the installed package:

      + pip install /tmp/cibuildwheel/repaired_wheel/duckdb-0.1.dev34218-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
  Processing /tmp/cibuildwheel/repaired_wheel/duckdb-0.1.dev34218-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
  Installing collected packages: duckdb
  Successfully installed duckdb-0.1.dev34218

@Mytherin Mytherin merged commit ce82d92 into duckdb:main Nov 16, 2023
45 checks passed
@Mytherin
Copy link
Collaborator

Thanks!

krlmlr added a commit to duckdb/duckdb-r that referenced this pull request Dec 11, 2023
Merge pull request duckdb/duckdb#9661 from Tishj/arrow_parent_offsets
Merge pull request duckdb/duckdb#9686 from ywelsch/yw/dbgen-schema-in-other-catalog
Merge pull request duckdb/duckdb#9674 from szarnyasg/replace-old-logos
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Incorrect nested data converted in Arrow <-> DuckDB conversion when read through Arrow DataSet (Python)
3 participants