Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Arrow] Fix segfault in appending list data #8042

Merged
merged 8 commits into from
Jun 26, 2023

Conversation

Tishj
Copy link
Contributor

@Tishj Tishj commented Jun 22, 2023

I came across this bug while investigating some memory usage issues with record batches

import duckdb
import pyarrow as pa

conn = duckdb.connect()

conn.execute("""
    create or replace table tbl as select * from (select {'a': [5,4,3,2,1]}), range(10000000)
""")

query = "SELECT * FROM tbl"
chunk_size = 1_000_000

batch_iter = conn.execute(query).fetch_record_batch(chunk_size)
for batch in batch_iter:
    del batch

The first batch always completes fine, but in subsequent batches we apply the Slice again which is a problem, in ArrowScalarBaseData::Append auto source_idx = format.sel->get_index(i); gives us back random uninitialized memory.

@Tishj Tishj requested a review from pdet June 22, 2023 10:54
@Tishj Tishj force-pushed the arrow_record_batch_segfault branch from 80a28bf to 23e15b0 Compare June 22, 2023 14:46
Copy link
Contributor

@pdet pdet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@Mytherin Mytherin marked this pull request as draft June 23, 2023 10:43
@Mytherin Mytherin marked this pull request as ready for review June 23, 2023 10:43
@Mytherin Mytherin marked this pull request as draft June 23, 2023 13:38
@Mytherin Mytherin marked this pull request as ready for review June 23, 2023 13:38
@github-actions github-actions bot marked this pull request as draft June 24, 2023 07:55
@Tishj Tishj marked this pull request as ready for review June 24, 2023 07:56
@Mytherin Mytherin merged commit b4e6f04 into duckdb:feature Jun 26, 2023
52 of 54 checks passed
@Mytherin
Copy link
Collaborator

Thanks!

@Tishj
Copy link
Contributor Author

Tishj commented Jun 26, 2023

@Mytherin
I was experiencing a weird unreproducable crash on Win32 in test_capi_arrow.cpp SECTION "null schema"
I went through the logic and it looked like my changes should have no effect at all on that codepath, but never the less I'm not sure if the crash isn't reproducable on Win32 as it had only gone through one CI run.

@Mytherin
Copy link
Collaborator

Oh whoops - I thought the only CI failure here was the flaky test in #8074. It seems there is also a failure in CodeCov now. Could you have a look and submit a follow-up PR that fixes those?

@Tishj
Copy link
Contributor Author

Tishj commented Jun 26, 2023

Sure, but I touched none of those files
https://github.com/Tishj/duckdb/actions/runs/5363302214/jobs/9730780442

I could "fix" the coverage by removing this feature...Tishj:duckdb:arrow_record_batch_segfault

Because it's not used anywhere

@Mytherin
Copy link
Collaborator

Mytherin commented Jun 26, 2023

Sure, but I touched none of those files

Yes but this PR removes code that uses the validity mask in a certain way, which is likely not used anywhere else. I think that is likely why the coverage dropped

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants