Skip to content

Add parquet read tests which require more than one batch #3339

Closed
@stress-tess

Description

We've encountered a number of examples where we run into issues where some new parquet logic to avoid single batches seems to work until we exceed ~700,000 elements (which seems to be around where a second batch is needed). This mostly crops up when we have some combination of nans, empty segs, and empty strings. To verify an future attempts to improve our batch writes in parquet don't run into the same problem, we should add tests for these "large" reads

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions