-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Scala 2.13 builds for Spark/v3.2 #4009
Conversation
@fqaiser94, you can edit your personal access token and under scopes, check the "workflow" box. Then you can update the github workflow. |
@wypoon thanks! |
I am not a committer and have no privileges. |
@fqaiser94 I'm not able to turn them on either, but if you |
Great to see someone working on this. Thank you, @fqaiser94! |
.../scala/org/apache/spark/sql/catalyst/parser/extensions/IcebergSparkSqlExtensionsParser.scala
Show resolved
Hide resolved
Map$.MODULE$.<String, String>newBuilder(); | ||
for (Map.Entry<String, String> entry : partitionFilter.entrySet()) { | ||
builder.$plus$eq(Tuple2.apply(entry.getKey(), entry.getValue())); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using the Scala API in Java is really ugly. I think that's why we prefer to use the Java API and convert unless it is performance critical. Is this change needed for Scala 2.13 or can we continue to convert?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I fully agree with you that "using the Scala API in Java is really ugly”, but in this case, I think is the best choice for reasons described below.
I thought it would be really simple; all we want to do is convert a java Map into an immutable Scala map.
So, we start off by converting a java Map into a mutable Scala map:
scala.collection.mutable.Map<String, String> myMutableMap = JavaConverters.mapAsScalaMapConverter(myJavaMap).asScala();`
And now we can convert a mutable Scala map into an immutable Scala map using the handy-dandy .toMap
method.
But wait, the .toMap
method takes an implicit argument ev
:
def toMap[K, V](implicit ev: A <:< (K, V)): immutable.Map[K, V]
This is basically a compiler trick to make sure we’re calling .toMap
on a collection of tuples (otherwise the method doesn’t make sense).
Normally, in Scala, this evidence is implicitly supplied.
Since we’re calling this method from Java, we don’t have the normal implicit mechanisms available so we have to construct and supply this evidence by hand.
In the past we were able to supply this evidence by using scala.Predef.conforms()
but this API was removed entirely in 2.13 (it was already deprecated in 2.12).
In scala 2.13, this implicit ev
value of type <:<
is provided by scala.Predef.refl()
.
However, scala.Predef.refl()
is not available in scala 2.12.
Therefore, in order to continue to use the .toMap
conversion method, we would have to start maintaining 2.12 and 2.13 specific source directories. IMO, this isn't worth the complexity it introduces.
Hope that makes sense. Open to thoughts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've changed the code to use the forEach
syntax (rather than for () { ... }
) so at least it looks a little less gnarly now :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, so the problem is that we can't convert directly to an Immutable Scala map, and the Scala API's use of implicits makes converting mutable to immutable really difficult. That seems reasonable. I'm not sure what we were doing before... maybe we didn't need an Immutable map?
This looks really close to being ready to commit to me. I'd like to separate the 2.13 tests out into a separate CI run so that we don't cause 3.1 and 3.2 tests to take twice as long. |
spark/v3.2/spark/src/main/java/org/apache/iceberg/spark/SparkTableUtil.java
Show resolved
Hide resolved
Looks great to me! I'll wait for tests to pass and then merge. |
Looks like some odd test failures. Somehow Scala can't compile when all Spark versions are enabled? Maybe this is a reuse problem from having multiple Scala versions in the same build? I wouldn't have a problem with adding a flag for Scala versions and running the build checks and Javadoc on just 2.12 for now, as long as we have full tests for 2.13 in a separate workflow. |
Yes, I suspect this is exactly what it is. Incidentally, this approach is similar to what they do over in the apache/kafka repo. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a couple minor things, but otherwise this looks ready to go once tests are passing.
@rdblue all comments addressed, can you give the CI a bump once more please? |
huzzah, all green |
Thanks, @fqaiser94! |
Resolves #3372
What changes were proposed in this pull request?
./gradlew -DscalaVersion=2.13 projects
:iceberg-spark:iceberg-spark-3.2_2.13
:iceberg-spark:iceberg-spark-extensions-3.2_2.13
:iceberg-spark:iceberg-spark-runtime-3.2_2.13
What approach was taken?
Iceberg spark libraries will be built for Scala 2.12 and 2.13 using the same source files.
This is simpler to maintain than having 2 sets of nearly identical source files.
The only downside is you need to ensure your code works across both versions, which is where the majority of the relatively simple changes in this PR come from.
Of particular note; a new dependency was introduced scala-collection-compat which makes some Scala 2.13 APIs available on 2.12.
What other approaches (if any) were considered?
I did look into existing Gradle plugins for cross-building scala (gradle-crossbuild-scala, gradle-scala-multiversion-plugin) but neither is particularly popular, and I had issues setting them up.
Also, note that I'm pretty new to Gradle so feel free to suggest other approaches if you know better.
How was this patch tested?
Existing unit tests must pass for scala 2.12 and 2.13 build.