Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SPARK-4159 [CORE] Maven build doesn't run JUnit test suites #3651

Closed
wants to merge 3 commits into from

Conversation

srowen
Copy link
Member

@srowen srowen commented Dec 9, 2014

This PR:

  • Reenables surefire, and copies config from scalatest (which is itself an old fork of surefire, so similar)
  • Tells surefire to test only Java tests
  • Enables surefire and scalatest for all children, and in turn eliminates some duplication.

For me this causes the Scala and Java tests to be run once each, it seems, as desired. It doesn't affect the SBT build but works for Maven. I still need to verify that all of the Scala tests and Java tests are being run.

@SparkQA
Copy link

SparkQA commented Dec 9, 2014

Test build #24269 has started for PR 3651 at commit f165f58.

  • This patch does not merge cleanly.

</configuration>
</plugin>
<!-- Scalatest runs all Scala tests -->
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

N00b question: Is it not possible to have Surefire run the Scala tests as well? Or are you just saving that change for a future PR?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It probably is; I had not tried it just yet. I do think this is a precursor to more change, potentially using surefire for everything.

@SparkQA
Copy link

SparkQA commented Dec 10, 2014

Test build #24269 has finished for PR 3651 at commit f165f58.

  • This patch passes all tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24269/
Test PASSed.

@SparkQA
Copy link

SparkQA commented Dec 10, 2014

Test build #24299 has started for PR 3651 at commit 125b0b6.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Dec 10, 2014

Test build #24299 has finished for PR 3651 at commit 125b0b6.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24299/
Test FAILed.

@SparkQA
Copy link

SparkQA commented Dec 10, 2014

Test build #24303 has started for PR 3651 at commit 11bd041.

  • This patch merges cleanly.

@srowen srowen changed the title SPARK-4159 [CORE] [WIP] Maven build doesn't run JUnit test suites SPARK-4159 [CORE] Maven build doesn't run JUnit test suites Dec 10, 2014
@srowen
Copy link
Member Author

srowen commented Dec 10, 2014

I'm pretty convinced this works now. I'm diffing the test run output between master and this branch, and the scala tests are the same. The only visible differences are that scalatest turns up in every module, and of course, output from surefire now.

Note that I did not enable assertions in SBT now, which I mentioned in a related conversation. There's another issue with it tracked in http://issues.apache.org/jira/browse/SPARK-4814

I also think this is a predecessor to https://issues.apache.org/jira/browse/SPARK-3431

Let's see what Jenkins says. I'm calling this no longer a WIP.

@SparkQA
Copy link

SparkQA commented Dec 10, 2014

Test build #24303 has finished for PR 3651 at commit 11bd041.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24303/
Test FAILed.

@srowen
Copy link
Member Author

srowen commented Dec 10, 2014

Jenkins, retest this.

<artifactId>maven-surefire-plugin</artifactId>
<configuration>
<environmentVariables>
<SPARK_HOME>${basedir}/../..</SPARK_HOME>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is ok, but a little verbose. What I've seen other projects do is define a "pathToRoot" variable that is referenced in the common configuration in the root pom. Then modules that rely on this env variable being set can override "pathToRoot" to have the right relative path.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, in the name of keeping it simple I might not touch this this time. Since this occurs 2 places only, it doesn't save much.

@vanzin
Copy link
Contributor

vanzin commented Dec 10, 2014

So, one problem I can think of is that the test log4j.properties has this:

log4j.appender.file.append=false

Meaning that the second test plugin will overwrite the log file generated by the first one. (In this case, I think scalatest will always run last, so you'll never see the log file for the surefire run.)

Two ways I see to fix it:

  • set append to "true", and clean up that file before running the tests
  • parameterize the log file name so that each test run writes to a different file

Also, bonus point if you place the test log4j.properties in a shared location and nuke all the duplicate ones in each module. :-) (Although that might be trying to squeeze too much cleanup into this patch.)

@srowen
Copy link
Member Author

srowen commented Dec 11, 2014

Good point about log4j.appender.file.append=false. It looks like the Scala tests overwrite. Hm, why not set append to true indeed? it's in target, so gets deleted by clean.

@SparkQA
Copy link

SparkQA commented Dec 11, 2014

Test build #24375 has started for PR 3651 at commit 48cb3f0.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Dec 11, 2014

Test build #24375 has finished for PR 3651 at commit 48cb3f0.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24375/
Test FAILed.

@vanzin
Copy link
Contributor

vanzin commented Dec 12, 2014

The only issue with using append = true is that multiple test invocations will just keep appending to the file, potentially making debugging a little more confusing. Not a big deal, I think. Maybe adding a task in the root pom that runs before surefire/scalatest and just deletes that file?

@srowen
Copy link
Member Author

srowen commented Dec 12, 2014

Jenkins, retest this please.

@SparkQA
Copy link

SparkQA commented Dec 12, 2014

Test build #24408 has started for PR 3651 at commit 48cb3f0.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Dec 12, 2014

Test build #24408 has finished for PR 3651 at commit 48cb3f0.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24408/
Test PASSed.

<artifactId>scalatest-maven-plugin</artifactId>
<executions>
<execution>
<id>test</id>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this particular <execution> configuration is specific to the java8-tests submodule; do we need to preserve this logic?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It serves to disable scalatest, but scalatest will not run *.java tests anyway. I just gave it a spin and surefire runs Java8APISuite.java but scalatest doesn't do anything but look for tests and not find any.

@JoshRosen
Copy link
Contributor

Overall, this PR looks pretty good. To quickly summarize the changes (mostly for my own benefit):

  • Remove the scalatest and scalatest-maven-plugin dependencies from the individual subproject POMs and add a scalatest dependency in the root POM (since every subproject depends on it).
  • Change every log4j.properties file to enable file appending instead of overwriting.
  • Use Surefire to run the Java tests.

Review comments:

  • We should see whether we need to preserve the special configuration of the ScalaTest Maven plugin in the java8-tests subproject.
  • I like @vanzin's suggestion of adding a task that clears out the test logs between runs.
  • I also like the idea of using a common log4j.properties file across all of the tests, but I'm fine with deferring that to a separate PR.

@srowen
Copy link
Member Author

srowen commented Dec 24, 2014

@JoshRosen I found that removing the SPARK_HOME config doesn't seem to matter; the REPL and YARN tests still pass. OK to remove that config in this PR, do you think?

…alatest; centralize test config in the parent
… and scalatest output is preserved. Also standardize/correct comments a bit.
@JoshRosen
Copy link
Contributor

I did a quick git grep through the codebase to find uses of SPARK_HOME and it looks like there's only a few places where it's read:

SparkContext, which is a fallback if spark.home is not set:

core/src/main/scala/org/apache/spark/SparkContext.scala-   * Get Spark's home location from either a value set through the constructor,
core/src/main/scala/org/apache/spark/SparkContext.scala-   * or the spark.home Java property, or the SPARK_HOME environment variable
core/src/main/scala/org/apache/spark/SparkContext.scala-   * (in that order of preference). If neither of these is set, return None.
core/src/main/scala/org/apache/spark/SparkContext.scala-   */
core/src/main/scala/org/apache/spark/SparkContext.scala-  private[spark] def getSparkHome(): Option[String] = {
core/src/main/scala/org/apache/spark/SparkContext.scala:    conf.getOption("spark.home").orElse(Option(System.getenv("SPARK_HOME")))
core/src/main/scala/org/apache/spark/SparkContext.scala-  }
core/src/main/scala/org/apache/spark/SparkContext.scala-
core/src/main/scala/org/apache/spark/SparkContext.scala-  /**
core/src/main/scala/org/apache/spark/SparkContext.scala-   * Set the thread-local property for overriding the call sites
core/src/main/scala/org/apache/spark/SparkContext.scala-   * of actions and RDDs.

PythonUtils, with no fallback:

core/src/main/scala/org/apache/spark/api/python/PythonUtils.scala-
core/src/main/scala/org/apache/spark/api/python/PythonUtils.scala-private[spark] object PythonUtils {
core/src/main/scala/org/apache/spark/api/python/PythonUtils.scala-  /** Get the PYTHONPATH for PySpark, either from SPARK_HOME, if it is set, or from our JAR */
core/src/main/scala/org/apache/spark/api/python/PythonUtils.scala-  def sparkPythonPath: String = {
core/src/main/scala/org/apache/spark/api/python/PythonUtils.scala-    val pythonPath = new ArrayBuffer[String]
core/src/main/scala/org/apache/spark/api/python/PythonUtils.scala:    for (sparkHome <- sys.env.get("SPARK_HOME")) {
core/src/main/scala/org/apache/spark/api/python/PythonUtils.scala-      pythonPath += Seq(sparkHome, "python").mkString(File.separator)
core/src/main/scala/org/apache/spark/api/python/PythonUtils.scala-      pythonPath += Seq(sparkHome, "python", "lib", "py4j-0.8.2.1-src.zip").mkString(File.separator)
core/src/main/scala/org/apache/spark/api/python/PythonUtils.scala-    }
core/src/main/scala/org/apache/spark/api/python/PythonUtils.scala-    pythonPath ++= SparkContext.jarOfObject(this)
core/src/main/scala/org/apache/spark/api/python/PythonUtils.scala-    pythonPath.mkString(File.pathSeparator)

FaultToleranceTest, which isn't actually run in our tests (since it needs a bunch of manual Docker setup to work):

core/src/main/scala/org/apache/spark/deploy/FaultToleranceTest.scala-  val zk =  SparkCuratorUtil.newClient(conf)
core/src/main/scala/org/apache/spark/deploy/FaultToleranceTest.scala-
core/src/main/scala/org/apache/spark/deploy/FaultToleranceTest.scala-  var numPassed = 0
core/src/main/scala/org/apache/spark/deploy/FaultToleranceTest.scala-  var numFailed = 0
core/src/main/scala/org/apache/spark/deploy/FaultToleranceTest.scala-
core/src/main/scala/org/apache/spark/deploy/FaultToleranceTest.scala:  val sparkHome = System.getenv("SPARK_HOME")
core/src/main/scala/org/apache/spark/deploy/FaultToleranceTest.scala-  assertTrue(sparkHome != null, "Run with a valid SPARK_HOME")
core/src/main/scala/org/apache/spark/deploy/FaultToleranceTest.scala-
core/src/main/scala/org/apache/spark/deploy/FaultToleranceTest.scala-  val containerSparkHome = "/opt/spark"
core/src/main/scala/org/apache/spark/deploy/FaultToleranceTest.scala-  val dockerMountDir = "%s:%s".format(sparkHome, containerSparkHome)
core/src/main/scala/org/apache/spark/deploy/FaultToleranceTest.scala-

SparkSubmitArguments, which uses this without a fallback:

core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala-   */
core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala-  private def mergeSparkProperties(): Unit = {
core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala-    // Use common defaults file, if not specified by user
core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala-    if (propertiesFile == null) {
core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala-      val sep = File.separator
core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala:      val sparkHomeConfig = env.get("SPARK_HOME").map(sparkHome => s"${sparkHome}${sep}conf")
core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala-      val confDir = env.get("SPARK_CONF_DIR").orElse(sparkHomeConfig)
core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala-
core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala-      confDir.foreach { sparkConfDir =>
core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala-        val defaultPath = s"${sparkConfDir}${sep}spark-defaults.conf"
core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala-        val file = new File(defaultPath)

Worker.scala, where this is overridden by spark.test.home:

core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala-  val sparkHome =
core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala-    if (testing) {
core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala-      assert(sys.props.contains("spark.test.home"), "spark.test.home is not set!")
core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala-      new File(sys.props("spark.test.home"))
core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala-    } else {
core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala:      new File(sys.env.get("SPARK_HOME").getOrElse("."))
core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala-    }
core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala-  var workDir: File = null
core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala-  val executors = new HashMap[String, ExecutorRunner]
core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala-  val finishedExecutors = new HashMap[String, ExecutorRunner]
core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala-  val drivers = new HashMap[String, DriverRunner]

There are also a few PySpark tests that use it, but the ./bin/pyspark script takes care of setting up the proper SPARK_HOME variable.

So, it looks like there a couple of usages that are potentially hazardous (PythonUtils and SparkSubmitArguments), but they're not in the YARN or REPL packages.

@srowen
Copy link
Member Author

srowen commented Dec 24, 2014

@JoshRosen Yes, sounds about right to me. I rebased and pushed one more commit to remove special SPARK_HOME setting in these modules too.

@SparkQA
Copy link

SparkQA commented Dec 24, 2014

Test build #24788 has started for PR 3651 at commit 2e8a0af.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Dec 24, 2014

Test build #24788 has finished for PR 3651 at commit 2e8a0af.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24788/
Test PASSed.

@JoshRosen
Copy link
Contributor

The only issue with using append = true is that multiple test invocations will just keep appending to the file, potentially making debugging a little more confusing. Not a big deal, I think. Maybe adding a task in the root pom that runs before surefire/scalatest and just deletes that file?

I suppose we still have this issue, right? It might not be a big deal if people run mvn clean between builds, but I could imagine it being annoying if you're doing incremental re-builds during an iterative debugging session. Is this hard to fix? I don't think it's the end of the world if we don't get to this now, though.

Also, we should probably backport this change into the maintenance branches so that we can detect whether their Java tests are broken. We should verify that SPARK_HOME is obsolete in yarn/repl for those versions, too.

@srowen
Copy link
Member Author

srowen commented Dec 28, 2014

@JoshRosen I think it's a small pain to add to the POM. You can bind a different execution of the Maven clean plugin to test or test-compile. I don't think it's hard, just more stuff in the POM. I had punted on it but if anyone votes for adding it I will do so.

@srowen
Copy link
Member Author

srowen commented Jan 6, 2015

@JoshRosen et al I'd love to get this in so that the Java test run again. Say the word and I'll add the extra clean configuration if that's desired. I think it's also OK as-is.

@JoshRosen
Copy link
Contributor

Meh, let's just merge this in for now. The appending behavior won't be confusing in Jenkins (since we clean) and I can only imagine this being annoying for developers who do incremental development in Maven on a local machine (probably a small minority of Spark developers). Having the tests actually run is a higher priority than a minor developer-only usability issue that will affect a handful of users; we can address the append behavior in a separate followup PR if anyone complains.

@JoshRosen
Copy link
Contributor

Merged to master (1.3.0). I'll handle the backport cherry-picks in a little bit.

@asfgit asfgit closed this in 4cba6eb Jan 6, 2015
@srowen srowen deleted the SPARK-4159 branch January 7, 2015 22:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants