Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build: Align the module name among spark2.4, spark3.0, spark3.1 and spark3.2 #4158

Merged
merged 2 commits into from
Mar 8, 2022

Conversation

openinx
Copy link
Member

@openinx openinx commented Feb 18, 2022

This PR is trying to address the comment from here.

The current spark2.4, spark3.0 have the following unaligned module name for me (spark3.1 & spark3.2 are more clear).

# Spark 2.4
iceberg-spark-runtime-0.13.1.jar
# Spark 3.0
iceberg-spark3-runtime-0.13.1.jar
# Spark 3.1
iceberg-spark-runtime-3.1_2.12-0.13.1.jar
# Spark 3.2
iceberg-spark-runtime-3.2_2.12-0.13.1.jar

After applied this PR, the current spark runtime jar are named as:

./gradlew clean build -x test -DsparkVersions=2.4,3.0,3.1,3.2 -DscalaVersion=2.12 -DknownScalaVersions=2.12 -x javadoc -Pquick -DflinkVersions= -DhiveVersions=

➜  iceberg git:(align-spark-module-name) find spark | grep runtime | grep SNAPSHOT.jar
spark/v3.2/spark-runtime/build/libs/iceberg-spark-runtime-3.2_2.12-0.13.0-SNAPSHOT.jar
spark/v2.4/spark-runtime/build/libs/iceberg-spark-runtime-2.4_2.12-0.13.0-SNAPSHOT.jar
spark/v3.0/spark-runtime/build/libs/iceberg-spark-runtime-3.0_2.12-0.13.0-SNAPSHOT.jar
spark/v3.1/spark-runtime/build/libs/iceberg-spark-runtime-3.1_2.12-0.13.0-SNAPSHOT.jar

@openinx openinx force-pushed the align-spark-module-name branch from cb5c27e to 6199163 Compare February 18, 2022 04:39
@openinx openinx added this to the Iceberg 0.14.0 Release milestone Feb 18, 2022
Copy link
Contributor

@kbendick kbendick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a good idea.

The older spark-runtime and spark3-runtime are confusing for end users. It’s less confusing with 3.2 and 3.2, but the naming of the 3.0 and 2.4 jars make it confusing for people still.

This is a breaking change, but I think that’s a good thing. I’ve talked to a handful of users who have upgraded to Spark 3.2 but are still using the spark3-runtime. If their build failed because it couldn’t find the artifact at the next release (say 0.14.0), then they’d have a better experience vs silently allowing a combination that we don’t test and expect to work.

I believe we kept spark-runtime and spark3-runtime for compatibility reasons, but I somewhat think this was a mistake as users will upgrade Spark versions but not necessarily move to the correct runtime jar because some end users continue to use spark3-runtime.

@pan3793
Copy link
Member

pan3793 commented Feb 18, 2022

For Spark 2.4, I think it shoud be scala free?

@kbendick
Copy link
Contributor

kbendick commented Feb 18, 2022

For Spark 2.4, I think it shoud be scala free?

Can you elaborate? All of the sparkl-core 2.4 jars in Maven have a both _2.11 and _2.12 version

EDIT: I realize you likely mean that we don't compile Spark 2's iceberg-runtime artifact with Scala. In fact, all of the references to Scala I see in the main build file actually uses a 2.11 dependency for Spark 2's module.

Example (as well as the compileOnly dependency of spark-hive_2.11):

force 'com.fasterxml.jackson.module:jackson-module-scala_2.11:2.11.4'

@pan3793
Copy link
Member

pan3793 commented Feb 19, 2022

@kbendick the 2.4 runtime module does not use Scala jars as runtime dependencies, and itself is written in pure Java code.

./gradlew -DdefaultSparkVersions=2.4 :iceberg-spark:iceberg-spark-runtime:dependencies --configuration runtimeClasspath
> Task :iceberg-spark:iceberg-spark-runtime:dependencies

------------------------------------------------------------
Project ':iceberg-spark:iceberg-spark-runtime'
------------------------------------------------------------

runtimeClasspath - Runtime classpath of source set 'main'.
+--- com.github.stephenc.findbugs:findbugs-annotations -> 1.3.9-1
+--- project :iceberg-spark:iceberg-spark2
|    +--- project :iceberg-api
|    |    +--- com.github.stephenc.findbugs:findbugs-annotations -> 1.3.9-1
|    |    \--- project :iceberg-bundled-guava
|    +--- com.github.stephenc.findbugs:findbugs-annotations -> 1.3.9-1
|    +--- project :iceberg-bundled-guava
|    +--- project :iceberg-common
|    |    +--- com.github.stephenc.findbugs:findbugs-annotations -> 1.3.9-1
|    |    \--- project :iceberg-bundled-guava
|    +--- project :iceberg-core
|    |    +--- project :iceberg-api (*)
|    |    +--- com.github.stephenc.findbugs:findbugs-annotations -> 1.3.9-1
|    |    +--- project :iceberg-common (*)
|    |    +--- project :iceberg-bundled-guava
|    |    +--- org.apache.avro:avro -> 1.10.1
|    |    |    +--- com.fasterxml.jackson.core:jackson-core:2.11.3 -> 2.11.4
|    |    |    \--- com.fasterxml.jackson.core:jackson-databind:2.11.3 -> 2.11.4
|    |    |         +--- com.fasterxml.jackson.core:jackson-annotations:2.11.4
|    |    |         \--- com.fasterxml.jackson.core:jackson-core:2.11.4
|    |    +--- com.fasterxml.jackson.core:jackson-databind -> 2.11.4 (*)
|    |    +--- com.fasterxml.jackson.core:jackson-core -> 2.11.4
|    |    +--- com.github.ben-manes.caffeine:caffeine -> 2.8.4
|    |    |    +--- org.checkerframework:checker-qual:3.4.0
|    |    |    \--- com.google.errorprone:error_prone_annotations:2.3.4
|    |    \--- org.roaringbitmap:RoaringBitmap -> 0.9.22
|    |         \--- org.roaringbitmap:shims:0.9.22
|    +--- project :iceberg-data
|    |    +--- project :iceberg-api (*)
|    |    +--- com.github.stephenc.findbugs:findbugs-annotations -> 1.3.9-1
|    |    +--- project :iceberg-bundled-guava
|    |    +--- project :iceberg-core (*)
|    |    +--- org.apache.orc:orc-core -> 1.7.3
|    |    |    +--- org.apache.orc:orc-shims:1.7.3
|    |    |    +--- io.airlift:aircompressor:0.21
|    |    |    +--- org.jetbrains:annotations:17.0.0
|    |    |    \--- org.threeten:threeten-extra:1.5.0
|    |    \--- org.apache.parquet:parquet-avro -> 1.12.2
|    |         +--- org.apache.parquet:parquet-column:1.12.2
|    |         |    +--- org.apache.parquet:parquet-common:1.12.2
|    |         |    |    +--- org.apache.parquet:parquet-format-structures:1.12.2
|    |         |    |    \--- org.apache.yetus:audience-annotations:0.12.0
|    |         |    \--- org.apache.parquet:parquet-encoding:1.12.2
|    |         |         \--- org.apache.parquet:parquet-common:1.12.2 (*)
|    |         +--- org.apache.parquet:parquet-hadoop:1.12.2
|    |         |    +--- org.apache.parquet:parquet-column:1.12.2 (*)
|    |         |    +--- org.apache.parquet:parquet-format-structures:1.12.2
|    |         |    +--- org.apache.parquet:parquet-jackson:1.12.2
|    |         |    \--- com.github.luben:zstd-jni:1.4.9-1
|    |         \--- org.apache.parquet:parquet-format-structures:1.12.2
|    +--- project :iceberg-orc
|    |    +--- project :iceberg-api (*)
|    |    +--- com.github.stephenc.findbugs:findbugs-annotations -> 1.3.9-1
|    |    +--- project :iceberg-bundled-guava
|    |    +--- project :iceberg-core (*)
|    |    +--- org.apache.avro:avro -> 1.10.1 (*)
|    |    \--- org.apache.orc:orc-core -> 1.7.3 (*)
|    +--- project :iceberg-parquet
|    |    +--- project :iceberg-api (*)
|    |    +--- com.github.stephenc.findbugs:findbugs-annotations -> 1.3.9-1
|    |    +--- project :iceberg-bundled-guava
|    |    +--- project :iceberg-core (*)
|    |    +--- project :iceberg-common (*)
|    |    \--- org.apache.parquet:parquet-avro -> 1.12.2 (*)
|    +--- project :iceberg-arrow
|    |    +--- project :iceberg-api (*)
|    |    +--- com.github.stephenc.findbugs:findbugs-annotations -> 1.3.9-1
|    |    +--- project :iceberg-bundled-guava
|    |    +--- project :iceberg-core (*)
|    |    +--- project :iceberg-parquet (*)
|    |    +--- org.apache.arrow:arrow-vector -> 7.0.0
|    |    |    +--- org.apache.arrow:arrow-format:7.0.0
|    |    |    |    \--- com.google.flatbuffers:flatbuffers-java:1.12.0
|    |    |    +--- org.apache.arrow:arrow-memory-core:7.0.0
|    |    |    +--- com.fasterxml.jackson.core:jackson-core:2.11.4
|    |    |    +--- com.fasterxml.jackson.core:jackson-annotations:2.11.4
|    |    |    +--- com.fasterxml.jackson.core:jackson-databind:2.11.4 (*)
|    |    |    \--- com.google.flatbuffers:flatbuffers-java:1.12.0
|    |    +--- org.apache.arrow:arrow-memory-netty -> 7.0.0
|    |    |    \--- org.apache.arrow:arrow-memory-core:7.0.0
|    |    +--- org.apache.parquet:parquet-avro -> 1.12.2 (*)
|    |    \--- io.netty:netty-buffer -> 4.1.68.Final
|    |         \--- io.netty:netty-common:4.1.68.Final
|    +--- com.github.ben-manes.caffeine:caffeine -> 2.8.4 (*)
|    +--- org.apache.orc:orc-core -> 1.7.3 (*)
|    \--- org.apache.arrow:arrow-vector -> 7.0.0 (*)
+--- project :iceberg-aws
|    +--- project :iceberg-api (*)
|    +--- com.github.stephenc.findbugs:findbugs-annotations -> 1.3.9-1
|    +--- project :iceberg-bundled-guava
|    +--- project :iceberg-common (*)
|    \--- project :iceberg-core (*)
+--- project :iceberg-aliyun
|    +--- project :iceberg-api (*)
|    +--- com.github.stephenc.findbugs:findbugs-annotations -> 1.3.9-1
|    +--- project :iceberg-bundled-guava
|    +--- project :iceberg-core (*)
|    \--- project :iceberg-common (*)
+--- project :iceberg-hive-metastore
|    +--- project :iceberg-api (*)
|    +--- com.github.stephenc.findbugs:findbugs-annotations -> 1.3.9-1
|    +--- project :iceberg-bundled-guava
|    +--- project :iceberg-core (*)
|    +--- project :iceberg-common (*)
|    \--- com.github.ben-manes.caffeine:caffeine -> 2.8.4 (*)
\--- project :iceberg-nessie
     +--- project :iceberg-api (*)
     +--- com.github.stephenc.findbugs:findbugs-annotations -> 1.3.9-1
     +--- project :iceberg-common (*)
     +--- project :iceberg-core (*)
     +--- project :iceberg-bundled-guava
     \--- org.projectnessie:nessie-client -> 0.20.0
          \--- org.projectnessie:nessie-model:0.20.0

(*) - dependencies omitted (listed previously)

@rdblue
Copy link
Contributor

rdblue commented Feb 20, 2022

@openinx can you raise this as a discussion thread on the mailing list? Originally, we wanted to avoid changing the modules that users depend on and opted not to rename. If we want to revisit that decision I'm fine with it, but we should make sure there's broader consensus.

@openinx
Copy link
Member Author

openinx commented Feb 21, 2022

@openinx can you raise this as a discussion thread on the mailing list? Originally, we wanted to avoid changing the modules that users depend on and opted not to rename. If we want to revisit that decision I'm fine with it, but we should make sure there's broader consensus.

Yes, I've just posted a discussion to iceberg-dev mail list. https://lists.apache.org/thread/cwhqktpw0f89gg7hx6fol4dgfkb1t6ny

@openinx
Copy link
Member Author

openinx commented Feb 21, 2022

@pan3793 I think you are right, we don't need to attach the scala suffix for spark 2.4. Thanks.

@github-actions github-actions bot added the INFRA label Feb 21, 2022
Copy link
Contributor

@jackye1995 jackye1995 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me, thanks for raising the discussion.

@@ -26,11 +26,11 @@ def scalaVersion = System.getProperty("scalaVersion") != null ? System.getProper
def jmhProjects = []

if (jdkVersion == '8' && sparkVersions.contains("2.4")) {
jmhProjects.add(project(":iceberg-spark:iceberg-spark2"))
jmhProjects.add(project(":iceberg-spark:iceberg-spark-2.4"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also add the scala version here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we've discussed in this conversation #4158 (comment), spark 2.4 should be scala-free, which means it won't add any scala code or pull in any transitive scala dependency for downstream iceberg-spark-2.4 users. So in theory we should not add the scala suffix version to the iceberg-spark-2.4 artifacts because the same artifact should be work for both spark with scala 2.12 and spark with scala 2.11.

@openinx
Copy link
Member Author

openinx commented Mar 8, 2022

Since we had an apache iceberg PMC (@jackye1995 ) approved this PR, I plan to get this merged now. Thanks all for reviewing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants