Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[Data] Avoid serializing datasource for Parquet read tasks (ray-proje…
…ct#41712) ray-project#41118 added an include_paths parameter to ParquetDatasource. As part of the PR, we pass an self._include_paths attribute to Parquet read tasks. As a result, the datasource (self) gets serialized with each read tasks. Normally, this isn't an issue, but if you're working with a large dataset (like in the failing release test), then the datasource is slow to serialize. This PR fixes the issue by removing the reference to self. Signed-off-by: Balaji Veeramani <balaji@anyscale.com>
- Loading branch information