Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Import implementation from OSMesa project #60

Merged
merged 41 commits into from
Mar 22, 2019

Conversation

jpolchlo
Copy link
Contributor

@jpolchlo jpolchlo commented Mar 1, 2019

The existing vectorpipe implementation is a bit of a Frankenstein creation of two projects (OSMesa and the original, RDD-based VP implementation). The last importation of OSMesa code was somewhat shoehorned into the structure of the old VP, and it doesn't necessarily suit the philosophy of OSMesa. We'd like to more forcefully adopt the DataFrame-centric approach of OSMesa, and bring in the extra facilities provided by the current OSMesa implementation (streaming data sources and the like).

This PR essentially wipes out VP and imports code from osmesa.common into the vectorpipe package.

This code is in a usable state now, but the intention is also to improve the documentation of this package as much as is practicable to make this codebase more accessible than it has been.

  • Document core usage patterns (ProcessOSM)
  • Document DataSources

At a later date, we will improve the vectortile export functionality of this library.

```
which will produce a frame consisting of "top-level" entities, which is to say
nodes that don't participate in a way, ways that don't participate in
relations, and a subset of the relations from the OSM data. The resulting
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be great to get an explicit list of the entities that will be produced by constructGeometries. @mojodna ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • points (from tagged nodes, including tags that really ought to be dropped, e.g. source=*)
  • polygons (from ways with tags that are cause them to be considered as areas)
  • lines (from ways without area tags)
  • multipolygons (from multipolygon or boundary relations)
  • multilinestrings (from route relations)

This currently does include ways that participate in relations (as long as they have tags? would need to check that. ways with tagging that's distinct from the relation suggests that it fulfills a separate, distinct role) and tagged nodes that participate in ways and relations (same deal as above; a bollard may contribute a vertex to a path).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I want to revisit the relation assembly process (for non-multipolygons) to prevent new geometries from being created and just propagating relation tags onto the constituent lines / polygons so that they can fulfill the same role without duplication. It does complicate things a bit, resulting in more minor versions of ways, as changes to relations that they're part of will change their metadata (similar to how changes to nodes change their geometry).)

@mojodna
Copy link
Collaborator

mojodna commented Mar 14, 2019

Tutorial bits (in src/main/tut/) should be updated or removed.

@mojodna
Copy link
Collaborator

mojodna commented Mar 14, 2019

data/ should also be cleaned up. (Though maybe these are for subsequent PRs.)

@jpolchlo
Copy link
Contributor Author

jpolchlo commented Mar 14, 2019

I wanted to leave the tut armature in place so that I can go and fill it in a bit later. Thanks for the reminder that it's there. Going to create an issue on it. (See #61)

jpolchlo and others added 26 commits March 22, 2019 10:40
Provides a path to developing within IntelliJ while specifying spark-sql
as a provided dependency (to avoid including Spark in assemblies where
they will conflict with the Spark runtime).

Also switches to the native ORC data source to facilitate dropping
spark-hive as a dependency.
The main change here is the removal of a pathological piece of geometry
from the tests.  Previous iterations of this test may have "gotten
lucky" in passing at any point in the past; the logic in the relation
reconstruction code was not up to the challenge.
@jpolchlo jpolchlo force-pushed the refactor/process-osm branch from 8115f08 to 12e2da2 Compare March 22, 2019 14:41
@jpolchlo jpolchlo merged commit cd4b6e7 into geotrellis:master Mar 22, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants