Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Re-add Support for Accumulo and HBase #690

Open
jbouffard opened this issue Oct 18, 2018 · 1 comment
Open

Re-add Support for Accumulo and HBase #690

jbouffard opened this issue Oct 18, 2018 · 1 comment

Comments

@jbouffard
Copy link
Collaborator

Overview

Even though it's claimed in the docs, GPS doesn't support Accumulo or HBase currently because we have removed those dependencies from the backend.

Background

Originally, the GPS backend was depended on both geotrellis-accumulo and geotrellis-hbase in order to provide support to their respective backends. However, at some point we removed those dependencies, as we thought they weren't actually needed in order to interact with the given backend. We now that this is not the case, as anyone trying to access Accumulo or HBase will receive the following error message:

Py4JJavaError: An error occurred while calling None.geopyspark.geotrellis.io.AttributeStoreWrapper.
: java.lang.RuntimeException: Unable to find AttributeStoreProvider for accumulo://user:password@zoo-keeper:2181/instance
	at geotrellis.spark.io.AttributeStore$$anonfun$apply$3.apply(AttributeStore.scala:102)
	at geotrellis.spark.io.AttributeStore$$anonfun$apply$3.apply(AttributeStore.scala:102)
	at scala.Option.getOrElse(Option.scala:121)
	at geotrellis.spark.io.AttributeStore$.apply(AttributeStore.scala:102)
	at geotrellis.spark.io.AttributeStore$.apply(AttributeStore.scala:106)
	at geopyspark.geotrellis.io.AttributeStoreWrapper.<init>(AttributeStoreWrapper.scala:25)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
	at py4j.Gateway.invoke(Gateway.java:238)
	at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
	at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
	at py4j.GatewayConnection.run(GatewayConnection.java:238)
	at java.lang.Thread.run(Thread.java:748)

Solutions

There are a few ways we can resolve this issue.

Solution 1: Re-Add the Dependencies

The most straightforward and easiest way to solve this problem would be to re-add the geotrellis-accumulo and geotrellis-hbase dependencies to the backend. While this is the easiest and most surefire way to get support back, it also creates a needlessly large fat jar which is cumbersome and will contain features users don't need/want. This should not be our first choice.

Solution 2: Add Additional Jars that Contain These Dependencies

Another solution would be to create a set of jars for the user to pick from. These jars will have different levels of support: Base GPS, Base GPS + Accumulo, Base GPS + HBase, and Base GPS + Accumulo + HBase. These jars could be downloaded via the GPS CLI:

geopyspark install base-jar // GPS with no Accumulo/HBase support
geopyspark install-jar // GPS + Accumulo/HBase support
geopyspark install-accumulo-jar // GPS + Accumulo support
geopyspark install-hbase-jar // GPS + HBase support

We can also add new make commands as well for building the jar:

make build-base // GPS with no Accumulo/HBase support
make build // GPS with Accumulo/HBase support
make build-base-with-accumulo // GPS with Accumulo support
make build-base-with-hbase // GPS with HBase support

The only issue with this solution would be maintaining the seperate jars. However, it may be worth the cost as the users gets to choose what they want in a straightforward way.

Other Solutions

The two above methods are just a few ways we can resolve this issue. We should take the time discuss other possible solutions and their pros/cons here.

@javyxu
Copy link

javyxu commented Apr 12, 2019

Hello, The current version of geopyspark is 0.4.3. This error will still occur after the implementation of geopyspark install-jar. Is there a better solution?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants