Add parquet read tests which require more than one batch #3339
Closed
Description
We've encountered a number of examples where we run into issues where some new parquet logic to avoid single batches seems to work until we exceed ~700,000 elements (which seems to be around where a second batch is needed). This mostly crops up when we have some combination of nans, empty segs, and empty strings. To verify an future attempts to improve our batch writes in parquet don't run into the same problem, we should add tests for these "large" reads
Metadata
Assignees
Labels
No labels