Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Scala 2.13 builds for Spark/v3.2 #4009

Merged
merged 5 commits into from
Feb 8, 2022
Merged

Conversation

fqaiser94
Copy link
Contributor

@fqaiser94 fqaiser94 commented Jan 30, 2022

Resolves #3372

What changes were proposed in this pull request?

  • Added a build flag for specifying scalaVersion: ./gradlew -DscalaVersion=2.13 projects
  • Added new projects for scala 2.13:
    • :iceberg-spark:iceberg-spark-3.2_2.13
    • :iceberg-spark:iceberg-spark-extensions-3.2_2.13
    • :iceberg-spark:iceberg-spark-runtime-3.2_2.13

What approach was taken?

Iceberg spark libraries will be built for Scala 2.12 and 2.13 using the same source files.
This is simpler to maintain than having 2 sets of nearly identical source files.
The only downside is you need to ensure your code works across both versions, which is where the majority of the relatively simple changes in this PR come from.
Of particular note; a new dependency was introduced scala-collection-compat which makes some Scala 2.13 APIs available on 2.12.

What other approaches (if any) were considered?

I did look into existing Gradle plugins for cross-building scala (gradle-crossbuild-scala, gradle-scala-multiversion-plugin) but neither is particularly popular, and I had issues setting them up.
Also, note that I'm pretty new to Gradle so feel free to suggest other approaches if you know better.

How was this patch tested?

Existing unit tests must pass for scala 2.12 and 2.13 build.

@fqaiser94 fqaiser94 marked this pull request as ready for review January 30, 2022 16:29
@wypoon
Copy link
Contributor

wypoon commented Jan 31, 2022

@fqaiser94, you can edit your personal access token and under scopes, check the "workflow" box. Then you can update the github workflow.

@fqaiser94
Copy link
Contributor Author

@wypoon thanks!
Also, are you able to approve running workflows on this branch?
I'd like to see if everything passes on CI at least :)

@wypoon
Copy link
Contributor

wypoon commented Feb 1, 2022

@wypoon thanks! Also, are you able to approve running workflows on this branch? I'd like to see if everything passes on CI at least :)

I am not a committer and have no privileges.

@kbendick
Copy link
Contributor

kbendick commented Feb 2, 2022

@fqaiser94 I'm not able to turn them on either, but if you ./gradlew clean check -DflinkVersions=1.12,1.13,1.14 -DsparkVersions=2.4,3.0,3.1,3.2 -DhiveVersions=2,3 locally on Java 8, that's more or less every test. Some of those don't support Java 11 though.

@rdblue
Copy link
Contributor

rdblue commented Feb 3, 2022

Great to see someone working on this. Thank you, @fqaiser94!

spark/v3.2/build.gradle Outdated Show resolved Hide resolved
Map$.MODULE$.<String, String>newBuilder();
for (Map.Entry<String, String> entry : partitionFilter.entrySet()) {
builder.$plus$eq(Tuple2.apply(entry.getKey(), entry.getValue()));
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using the Scala API in Java is really ugly. I think that's why we prefer to use the Java API and convert unless it is performance critical. Is this change needed for Scala 2.13 or can we continue to convert?

Copy link
Contributor Author

@fqaiser94 fqaiser94 Feb 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fully agree with you that "using the Scala API in Java is really ugly”, but in this case, I think is the best choice for reasons described below.


I thought it would be really simple; all we want to do is convert a java Map into an immutable Scala map.
So, we start off by converting a java Map into a mutable Scala map: 


scala.collection.mutable.Map<String, String> myMutableMap = JavaConverters.mapAsScalaMapConverter(myJavaMap).asScala();`

And now we can convert a mutable Scala map into an immutable Scala map using the handy-dandy .toMap method.
But wait, the .toMap method takes an implicit argument ev:

def toMap[K, V](implicit ev: A <:< (K, V)): immutable.Map[K, V]

This is basically a compiler trick to make sure we’re calling .toMap on a collection of tuples (otherwise the method doesn’t make sense).
Normally, in Scala, this evidence is implicitly supplied.
Since we’re calling this method from Java, we don’t have the normal implicit mechanisms available so we have to construct and supply this evidence by hand.
In the past we were able to supply this evidence by using scala.Predef.conforms() but this API was removed entirely in 2.13 (it was already deprecated in 2.12).
In scala 2.13, this implicit ev value of type <:< is provided by scala.Predef.refl().
However, scala.Predef.refl() is not available in scala 2.12.
Therefore, in order to continue to use the .toMap conversion method, we would have to start maintaining 2.12 and 2.13 specific source directories. IMO, this isn't worth the complexity it introduces.

Hope that makes sense. Open to thoughts.

Copy link
Contributor Author

@fqaiser94 fqaiser94 Feb 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've changed the code to use the forEach syntax (rather than for () { ... }) so at least it looks a little less gnarly now :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, so the problem is that we can't convert directly to an Immutable Scala map, and the Scala API's use of implicits makes converting mutable to immutable really difficult. That seems reasonable. I'm not sure what we were doing before... maybe we didn't need an Immutable map?

@rdblue
Copy link
Contributor

rdblue commented Feb 3, 2022

This looks really close to being ready to commit to me. I'd like to separate the 2.13 tests out into a separate CI run so that we don't cause 3.1 and 3.2 tests to take twice as long.

@rdblue
Copy link
Contributor

rdblue commented Feb 6, 2022

Looks great to me! I'll wait for tests to pass and then merge.

@rdblue
Copy link
Contributor

rdblue commented Feb 6, 2022

Looks like some odd test failures. Somehow Scala can't compile when all Spark versions are enabled? Maybe this is a reuse problem from having multiple Scala versions in the same build? I wouldn't have a problem with adding a flag for Scala versions and running the build checks and Javadoc on just 2.12 for now, as long as we have full tests for 2.13 in a separate workflow.

@fqaiser94
Copy link
Contributor Author

 Maybe this is a reuse problem from having multiple Scala versions in the same build?

Yes, I suspect this is exactly what it is.
In the latest commit, I’ve added a build flag to specify the scalaVersion.
By default, we will use scala 2.12 but you can switch over to 2.13 as needed e.g. ./gradlew -DscalaVersion=2.13 projects.

Incidentally, this approach is similar to what they do over in the apache/kafka repo.

settings.gradle Show resolved Hide resolved
spark/v3.2/build.gradle Outdated Show resolved Hide resolved
spark/v3.2/build.gradle Outdated Show resolved Hide resolved
Copy link
Contributor

@rdblue rdblue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a couple minor things, but otherwise this looks ready to go once tests are passing.

@fqaiser94
Copy link
Contributor Author

fqaiser94 commented Feb 8, 2022

@rdblue all comments addressed, can you give the CI a bump once more please?

@fqaiser94
Copy link
Contributor Author

huzzah, all green

@rdblue rdblue merged commit 4d78370 into apache:master Feb 8, 2022
@rdblue
Copy link
Contributor

rdblue commented Feb 8, 2022

Thanks, @fqaiser94!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

spark/v3.2 code does not build against Scala 2.13
4 participants