Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for spatial joins #701

Open
bflammers opened this issue Jan 16, 2019 · 2 comments
Open

Support for spatial joins #701

bflammers opened this issue Jan 16, 2019 · 2 comments

Comments

@bflammers
Copy link

Hi there,

I have searched the docs on how to do simple spatial operations between geometries such as checking whether a point falls within a polygon. For my use case, I want to do this for a large collection of points and a large collection of polygons. In other Geo libraries, this is sometimes referred to as a spatial join.

Unfortunately, I have not been able to find anything on the simple operations as well as on the spatial joins. Based on a quick read of the GeoTrellis documentation, it seems that these things are supported in the scala library.

I believe this implies one of the following:

  1. GeoPySpark is a limited interface to GeoTrellis
  2. The GeoPySpark docs are not complete
  3. I have missed the relevant sections in the docs completely

In case of 1): Will this functionality be added in the future?
In case of 2): Will the documentation be updated in the future?
In case of 3): Could you please point me in the right direction?

Thanks

@jbouffard
Copy link
Collaborator

@bflammers I am very sorry for just responding to your issue now. I somehow missed being notified about it.

Case 1 is correct. Vector operations are supported in GeoTrellis but not in GeoPySpark. This is because that while GeoPySpark is a Python binding of GeoTrellis, it fills a slightly different niche in the Python ecosystem than GeoTrellis does in Scala. The Python community already has various Vector libraries (shapely, fiona, etc), so the focus of GeoPySpark is mainly processing, formatting, and analyzing large amounts of raster data at scale.

So to answer your question: operations like spatial joins for Vectors will probably not be supported in GeoPySpark. However, if there's need for Vector processing at scale in Python, then that's something we may end up implementing.

@bflammers
Copy link
Author

@jbouffard Thank you for your answer.

I think there is a need for this. I have been searching for a library that allows to perform spatial joins on Vectors using PySpark for some time, but there is no such thing at the moment. Please correct me if I am wrong! And I am not the only one looking for this: link. Would be great if it would be implemented

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants