The following module adds Kotlin language support to Apache Zeppelin. Here is the guide to its implementation and how it can be improved and tested.
For interactive Kotlin execution, an instance of KotlinRepl
is created.
To set REPL properties (such as classpath, generated classes output directory, max result, etc.),
pass KotlinReplProperties
to its constructor. For example:
KotlinReplProperties replProperties = new KotlinReplProperties()
.maxResult(1000)
.shortenTypes(true);
KotlinRepl repl = new KotlinRepl(replProperties);
You can also bind variables and functions on REPL creation using implicit receiver language feature.
This means that all code run in REPL will be executed in Kotlin's with
block with the receiver,
making the receiver's fields and methods accessible.
To add your variables/functions, extend KotlinReceiver
class (in separate file), declare your fields and methods, and pass an instance of it to
KotlinReplProperties
. Example:
// In separate file:
class CustomReceiver extends KotlinReceiver {
public int myValue = 1 // will be converted to Kotlin "var myValue: Int"
public final String messageTemplate = "Value = %VALUE%" // "val messageTemplate: String"
public String getMessage() {
return messageTemplate.replace("%VALUE%", String.valueOf(myValue));
}
}
// In intepreter creation:
replProperties.receiver(new CustomReceiver);
KotlinRepl repl = new KotlinRepl(replProperties);
repl.eval("getMessage()"); // will return interpreterResult with "Value = 1" string
In KotlinInterpreter
REPL properties are created on construction, are accessible via getKotlinReplProperties
method,
and are used in REPL creation on open()
.
Each code snippet run in REPL is registered as a separate class and saved in location
specified by outputDir
REPL property. Anonymous classes and lambdas also get saved there under specific names.
This is needed for Spark to send classes to remote executors, and in Spark Kotlin interpreter this directory is the same
as in sparkContext
option spark.repl.class.outputDir
.
Kotlin interpreter in Spark intepreter group takes SparkSession
, JavaSparkContext
, SQLContext
and ZeppelinContext
from SparkInterpreter
in the same session and binds them in its scope.
Kotlin Interpreter and Spark Kotlin Interpreter come with unit tests.
They can be run with
mvn clean test
in $ZEPPELIN_HOME/kotlin
for base Kotlin Interpreter and
mvn -Dtest=KotlinSparkInterpreterTest test
in $ZEPPELIN_HOME/spark/interpreter
for Spark Kotlin Interpreter.
To test manually, build Zeppelin with
mvn clean package -DskipTests
and create a note with kotlin
interpreter for base or spark
for Spark.
In Spark interpreter, add %spark.kotlin
in the start of paragraph to use Kotlin Spark Interpreter.
Example:
%spark.kotlin
val df = spark.range(10)
df.show()
- It would be great to bind
ZeppelinContext
to base Kotlin interpreter, but for now I had trouble instantiating it inside KotlinInterpreter. - When Kotlin has its own Spark API, it will be good to move to it. Currently in Java Spark API Kotlin
can not use things like
forEach
because of ambiguity betweenIterable<?>.forEach
andMap<?, ?>.forEach
(foreach
from Spark's API does work, though). - The scoped mode for Kotlin Spark Interpreter currently has issues with having the same class output directory
for different intepreters, leading to overwriting classes. Adding prefixes to generated classes or putting them
in separate directories leads to
ClassNotFoundException
on Spark executors.