Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid updating partitions table when unnecessary #114

Merged
merged 2 commits into from
May 16, 2022

Conversation

lossyrob
Copy link
Member

This commit refactors the loader code to avoid unnecessary partition updates. Partition updates should be avoided when unnecessary as they can have a large performance impact for partitions containing many items.

@@ -137,15 +148,18 @@ def read_json(file: Union[Path, str, Iterator[Any]] = "stdin") -> Iterable:
yield orjson.loads(line)


@dataclass
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove dataclass; this is more appropriate as a normal Python object. IMO dataclasses should have light functionality and really be kept to data containers with public members (sort of like a struct). The dataclass requirement for immutable defaults was forcing the need to make this private attribute optional, which triggered the change from a dataclass to a standard object.

@lossyrob lossyrob force-pushed the feature/rde/partition-insert branch from 787fc09 to ebd4b6d Compare May 14, 2022 19:15
geom = Geometry.from_geojson(geojson)
if geom is None:
raise Exception(f"Invalid geometry encountered: {geojson}")
geometry = str(geom.wkb)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check for None to appease type linter

This commit refactors the loader code to avoid unnecessary
partition updates. Partition updates should be avoided
when unnecessary as they can have a large performance impact
for partitions containing many items.
@lossyrob lossyrob force-pushed the feature/rde/partition-insert branch from ebd4b6d to f4f1e82 Compare May 14, 2022 19:55
@lossyrob
Copy link
Member Author

Tested this on a re-ingest of Sentinel 1 GRD with about a ~10x improvement in ingest timing

@lossyrob lossyrob requested a review from bitner May 16, 2022 14:12
@lossyrob lossyrob mentioned this pull request May 16, 2022
Copy link
Collaborator

@bitner bitner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@lossyrob lossyrob merged commit 465f62a into main May 16, 2022
@lossyrob lossyrob deleted the feature/rde/partition-insert branch May 16, 2022 14:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants