-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Build: Align the module name among spark2.4, spark3.0, spark3.1 and spark3.2 #4158
Conversation
6854d85
to
cb5c27e
Compare
cb5c27e
to
6199163
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a good idea.
The older spark-runtime and spark3-runtime are confusing for end users. It’s less confusing with 3.2 and 3.2, but the naming of the 3.0 and 2.4 jars make it confusing for people still.
This is a breaking change, but I think that’s a good thing. I’ve talked to a handful of users who have upgraded to Spark 3.2 but are still using the spark3-runtime. If their build failed because it couldn’t find the artifact at the next release (say 0.14.0), then they’d have a better experience vs silently allowing a combination that we don’t test and expect to work.
I believe we kept spark-runtime and spark3-runtime for compatibility reasons, but I somewhat think this was a mistake as users will upgrade Spark versions but not necessarily move to the correct runtime jar because some end users continue to use spark3-runtime
.
For Spark 2.4, I think it shoud be scala free? |
Can you elaborate? All of the EDIT: I realize you likely mean that we don't compile Spark 2's iceberg-runtime artifact with Scala. In fact, all of the references to Scala I see in the main build file actually uses a 2.11 dependency for Spark 2's module. Example (as well as the iceberg/spark/v2.4/build.gradle Line 33 in 9cc2ac8
|
@kbendick the 2.4 runtime module does not use Scala jars as runtime dependencies, and itself is written in pure Java code.
|
@openinx can you raise this as a discussion thread on the mailing list? Originally, we wanted to avoid changing the modules that users depend on and opted not to rename. If we want to revisit that decision I'm fine with it, but we should make sure there's broader consensus. |
Yes, I've just posted a discussion to iceberg-dev mail list. https://lists.apache.org/thread/cwhqktpw0f89gg7hx6fol4dgfkb1t6ny |
@pan3793 I think you are right, we don't need to attach the scala suffix for spark 2.4. Thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good to me, thanks for raising the discussion.
@@ -26,11 +26,11 @@ def scalaVersion = System.getProperty("scalaVersion") != null ? System.getProper | |||
def jmhProjects = [] | |||
|
|||
if (jdkVersion == '8' && sparkVersions.contains("2.4")) { | |||
jmhProjects.add(project(":iceberg-spark:iceberg-spark2")) | |||
jmhProjects.add(project(":iceberg-spark:iceberg-spark-2.4")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we also add the scala version here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we've discussed in this conversation #4158 (comment), spark 2.4 should be scala-free, which means it won't add any scala code or pull in any transitive scala dependency for downstream iceberg-spark-2.4 users. So in theory we should not add the scala suffix version to the iceberg-spark-2.4 artifacts because the same artifact should be work for both spark with scala 2.12 and spark with scala 2.11.
Since we had an apache iceberg PMC (@jackye1995 ) approved this PR, I plan to get this merged now. Thanks all for reviewing. |
This PR is trying to address the comment from here.
The current spark2.4, spark3.0 have the following unaligned module name for me (spark3.1 & spark3.2 are more clear).
After applied this PR, the current spark runtime jar are named as: