-
Notifications
You must be signed in to change notification settings - Fork 939
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SD-232 Recycle classloaders to be anti-hostile to JIT #2754
Conversation
The compiler interface subclasses `scala.tools.nsc.Global`, and loading this new subclass before each `compile` task forces HotSpot JIT to deoptimize larges swathes of compiled code. It's a bit like SBT has rigged the dice to always descend the longest ladder in a game of Snakes and Ladders. The slowdown seems to be larger with Scala 2.12. There are a number of variables at play, but I think the main factor here is that we now rely on JIT to devirtualize calls to final methods in traits whereas we used to emit static calls. JIT does a good job at this, so long as classloading doesn't undo that good work. This commit extends the existing `ClassLoaderCache` to encompass the classloader that includes the compiler interface JAR. I've resorted to adding a var to `AnalyzingCompiler` to inject the dependency to get the cache to the spot I need it without binary incompatible changes to the intervening method signatures.
Note that the 2.12 numbers are based on a branch that includes scala/scala#5417, a performance regression in Scala 2.12s code generation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what game of Snakes and Ladders you played, because you're meant to ascend with the ladders and descend with the snakes.
I'm a little bit on the fence here about whether we should do the necessary song-and-dance to burrow State's classLoaderCache to AnalzyingCompiler.
So I'll LGTM, and see what @eed3si9n says.
We need some notes here, both to flash how much better it performs and to document sbt.disable.interface.classloader.cache
.
Must have been Co-Snakes-and-ladders. |
I just tried this out on SBT itself. Steady state performance of |
@retronym I removed the var and setter in dwijnand@b6a4789. Have a look and see if you want to cherry-pick it. |
Interesting, do you mean that it's deoptimizing code unrelated to |
@smarter Take a look at the diagnostic flags and output in scala/scala-dev#232 for the trail of clues. In general, loading classes can deoptimize speculative optimizations based on Call Hierarachy Analysis. Similarly, loading and using a new Here's the JDK code that prints out the deoptimization messages: http://hg.openjdk.java.net/jdk8/jdk8/hotspot/file/tip/src/share/vm/runtime/deoptimization.cpp |
A theory for why the improvement in peak performance in the SBT's own build was only modest is that when multiple sub projects compiled in parallel the single copy of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall the change looks good, but I'm a bit concerned about how SoftReference
behaves against several other uses of SoftReference
, like in Analysis cache with many-subproject build.
If we're putting this in to 0.13.13, then RC should be bumped back.
@eed3si9n I've run a test under this patch in which I constrained the heap and metaspace, and then compiled a project with a sequence of different Scala versions within a single SBT session. https://gist.github.com/retronym/d621926d6f9ea9d3c3c992ac44fa87a7 |
@retronym Thanks for the additional testing. |
I've just verified this again on scala/scala 2.12.x branch now that we've upgraded to SBT 0.13.13. Below are the times to perform a clean compile of our sources as the JVM warms up, with the classloader caching in this PR disabled then enabled.
|
I've repeated the experiment on our 2.11.x branch (with one local change to bump from SBT 0.13.12 to 0.13.12). Similar numbers:
That's good news, I wasn't sure whether the gain would be the same using 2.11 or 2.12 as a compiler. |
Nice |
@eed3si9n Do you know if/how this problem relates to Maven/Gradle/IntelliJ using zinc without SBT? |
Given that this setting is passed in via Defaults.scala, I would assume that none of the other build tools would be aware of this change unless we publish typesafehub/zinc and let them them know specifically for it. |
@eed3si9n Okay, tracking that task as typesafehub/zinc#105 |
https://github.com/pantsbuild/pants will happily pick this up assuming it is available in https://github.com/sbt/zinc, which we consume directly. Thanks! |
Some interesting results from my one of the scala project at x.ai: The code base has 209 files, 70k LoC (excluding blanks and comments). There are 174 traits, 193 classes (93 are implicit classes), 514 case classes, 1029 objects. There is a small abuse of Shapeless, that produces coproducts for most of our domain objects, when fully compiled it adds a signficant slowdown. The same can be said of our centralized serialization source. On sbt 0.13.12 First try on a fresh sbt session, after
Same SBT session, after another
On SBT 0.13.13 First try on a fresh sbt session, after
Same SBT session, after another
|
Thanks for sharing those numbers, @pedrofurla! |
As SBT itself does: sbt/sbt#2754
As SBT itself does: sbt/sbt#2754 Fixes typesafehub#105
Related PR in dotty's compiler bridge: scala/scala3#2262 |
@adriaanm I don't think there's anything that needs to be done in the scalac bridge, the dotty bridge needs to create its own classloader (because the sbt phases are part of the compiler and not part of the bridge, and the classloader constructed by sbt doesn't support that), but the scalac bridge just uses the classloader that should already be cached by sbt thanks to this PR. |
For the record, see https://github.com/lampepfl/dotty/blob/9e45ad16d012e6a2ff3be411c2fe101b1c74b831/sbt-bridge/src/xsbt/CompilerClassLoader.scala#L45-L72 for the gory details. |
Ok, thanks for explaining! |
So far, Mill was caching ScalaInstance which contains a classloader but this is not enough: Zinc creates its own classloader by combining the ScalaInstance classloader with the path to the compiler-bridge. Zinc takes care of caching this classloader in each instance of ScalaCompiler. We can take advantage of this by caching an instance of Compilers since it contains everything we want to cache (ScalaCompiler and ScalaInstance). This significantly reduces the amount of classloading that happens at each compilation step, as measured by running: export _JAVA_OPTIONS="-XX:+UnlockDiagnosticVMOptions -XX:+TraceClassLoading -XX:+TraceClassUnloading" mill -i foo.compile Then looking at the output of `out/mill-worker-1/logs` while adding a new line in a source file in `foo` and running `mill -i foo.compile` again. The speedup is going to depend on the project, but I measured a ~2x improvement when running on the Mill build `time(core.compile())` and `rm -rf out/core/compile/` in a loop until results stabilized. See also the results that were obtained when a very similar issue was fixed in sbt itself: sbt/sbt#2754
So far, Mill was caching ScalaInstance which contains a classloader but this is not enough: Zinc creates its own classloader by combining the ScalaInstance classloader with the path to the compiler-bridge. Zinc takes care of caching this classloader in each instance of ScalaCompiler. We can take advantage of this by caching an instance of Compilers since it contains everything we want to cache (ScalaCompiler and ScalaInstance). This significantly reduces the amount of classloading that happens at each compilation step, as measured by running: export _JAVA_OPTIONS="-XX:+UnlockDiagnosticVMOptions -XX:+TraceClassLoading -XX:+TraceClassUnloading" mill -i foo.compile Then looking at the output of `out/mill-worker-1/logs` while adding a new line in a source file in `foo` and running `mill -i foo.compile` again. The speedup is going to depend on the project, but I measured a ~2x improvement when running on the Mill build `time(core.compile())` and `rm -rf out/core/compile/` in a loop until results stabilized. See also the results that were obtained when a very similar issue was fixed in sbt itself: sbt/sbt#2754
The compiler interface subclasses
scala.tools.nsc.Global
,and loading this new subclass before each
compile
task forcesHotSpot JIT to deoptimize larges swathes of compiled code. It's
a bit like SBT has rigged the dice to always descend the longest
ladder in a game of Snakes and Ladders.
The slowdown seems to be larger with Scala 2.12. There are a number
of variables at play, but I think the main factor here is that
we now rely on JIT to devirtualize calls to final methods in traits
whereas we used to emit static calls. JIT does a good job at this,
so long as classloading doesn't undo that good work.
This commit extends the existing
ClassLoaderCache
to encompassthe classloader that includes the compiler interface JAR. I've
resorted to adding a var to
AnalyzingCompiler
to inject thedependency to get the cache to the spot I need it without binary
incompatible changes to the intervening method signatures.
Here are a few numbers showing the speedup of the peak (warmed up) performance: