Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Datasets] Add local and S3 filesystem test coverage for file-based datasources. #17158

Merged

Conversation

clarkzinzow
Copy link
Contributor

@clarkzinzow clarkzinzow commented Jul 16, 2021

Ensures that manually specified local and S3 filesystems are covered in our file-based datasources tests.

Related issue number

Closes #17084

Checks

  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

@ericl ericl added the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Jul 16, 2021
@ericl ericl added this to the Datasets Alpha milestone Jul 16, 2021
@clarkzinzow clarkzinzow force-pushed the datasets/feat/add-filesystem-test-coverage branch from 0791b0a to bac914e Compare July 17, 2021 20:35
@clarkzinzow clarkzinzow force-pushed the datasets/feat/add-filesystem-test-coverage branch from 5ef83de to 66466df Compare July 19, 2021 18:43
@clarkzinzow clarkzinzow force-pushed the datasets/feat/add-filesystem-test-coverage branch from 66466df to 0030e68 Compare July 22, 2021 23:39
@ericl ericl removed this from the Datasets Beta milestone Jul 30, 2021
ericl
ericl previously requested changes Aug 2, 2021
Copy link
Contributor

@ericl ericl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to add parquet tests here (currently S3 parquet read support is totally broken since we don't import pyarrow.fs before deserialization of the filesystem impl).

@clarkzinzow clarkzinzow force-pushed the datasets/feat/add-filesystem-test-coverage branch from 0030e68 to c4b633b Compare August 11, 2021 01:29
@clarkzinzow clarkzinzow dismissed ericl’s stale review August 11, 2021 01:33

This PR is only for adding filesystem test coverage for the file-based datasources (JSON and CSV), I'm planning on adding coverage for Parquet in a separate PR to keep the diff small.

@clarkzinzow
Copy link
Contributor Author

Still hitting the infinite recursion here when pickling. ☹️

@ericl
Copy link
Contributor

ericl commented Aug 11, 2021 via email

@clarkzinzow
Copy link
Contributor Author

CC @iycheng @suquark, also going to ping y'all in the Slack thread that has more info.

@clarkzinzow clarkzinzow removed the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Aug 12, 2021
@clarkzinzow clarkzinzow added the tests-ok The tagger certifies test failures are unrelated and assumes personal liability. label Aug 12, 2021
@clarkzinzow
Copy link
Contributor Author

Datasets tests are looking good! This should be ready to merge.

@ericl ericl merged commit d6eeb5d into ray-project:master Aug 12, 2021
@ericl
Copy link
Contributor

ericl commented Aug 12, 2021

🎉

Bam4d pushed a commit to Bam4d/ray that referenced this pull request Aug 13, 2021
@hckuo2
Copy link
Contributor

hckuo2 commented Nov 30, 2021

This PR adds back opencv-python-headless as a req package which was removed here #16929. Therefore, the installation have problems for python 3.9.

pcmoritz pushed a commit that referenced this pull request Feb 22, 2022
PR #16929 removed opencv-python-headless.
PR #17158 added it back but did not use it. This was noted by [a reviewer](#17158 (comment)) since it breaks python3.9 (no wheel is available for installation).
simonsays1980 pushed a commit to simonsays1980/ray that referenced this pull request Feb 27, 2022
PR ray-project#16929 removed opencv-python-headless.
PR ray-project#17158 added it back but did not use it. This was noted by [a reviewer](ray-project#17158 (comment)) since it breaks python3.9 (no wheel is available for installation).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
tests-ok The tagger certifies test failures are unrelated and assumes personal liability.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Dataset] Add S3-backed tests for Ray Dataset IO layer.
4 participants