Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SD-232 Recycle classloaders to be anti-hostile to JIT #2754

Merged
merged 2 commits into from
Sep 26, 2016

Conversation

retronym
Copy link
Member

@retronym retronym commented Sep 26, 2016

The compiler interface subclasses scala.tools.nsc.Global,
and loading this new subclass before each compile task forces
HotSpot JIT to deoptimize larges swathes of compiled code. It's
a bit like SBT has rigged the dice to always descend the longest
ladder in a game of Snakes and Ladders.

The slowdown seems to be larger with Scala 2.12. There are a number
of variables at play, but I think the main factor here is that
we now rely on JIT to devirtualize calls to final methods in traits
whereas we used to emit static calls. JIT does a good job at this,
so long as classloading doesn't undo that good work.

This commit extends the existing ClassLoaderCache to encompass
the classloader that includes the compiler interface JAR. I've
resorted to adding a var to AnalyzingCompiler to inject the
dependency to get the cache to the spot I need it without binary
incompatible changes to the intervening method signatures.

Here are a few numbers showing the speedup of the peak (warmed up) performance:

Codebase Scala Version Direct SBT 0.13.12 This PR
src/compiler 2.12.0-SNAPSHOT 19s 32s (1.68x) 24s (1.26x)
scalap 2.11.8 1.19 s 1.72s (1.45x) 1.39s (1.17x)
scalap 2.12.0-SNAPSHOT 1.02 s 1.95s (1.91x) 1.20s (1.17x)

The compiler interface subclasses `scala.tools.nsc.Global`,
and loading this new subclass before each `compile` task forces
HotSpot JIT to deoptimize larges swathes of compiled code. It's
a bit like SBT has rigged the dice to always descend the longest
ladder in a game of Snakes and Ladders.

The slowdown seems to be larger with Scala 2.12. There are a number
of variables at play, but I think the main factor here is that
we now rely on JIT to devirtualize calls to final methods in traits
whereas we used to emit static calls. JIT does a good job at this,
so long as classloading doesn't undo that good work.

This commit extends the existing `ClassLoaderCache` to encompass
the classloader that includes the compiler interface JAR. I've
resorted to adding a var to `AnalyzingCompiler` to inject the
dependency to get the cache to the spot I need it without binary
incompatible changes to the intervening method signatures.
@retronym retronym changed the title SD-232 Recycle classloaders to be anti-hostile to JIT. SD-232 Recycle classloaders to be anti-hostile to JIT Sep 26, 2016
@retronym
Copy link
Member Author

Note that the 2.12 numbers are based on a branch that includes scala/scala#5417, a performance regression in Scala 2.12s code generation.

Copy link
Member

@dwijnand dwijnand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what game of Snakes and Ladders you played, because you're meant to ascend with the ladders and descend with the snakes.

I'm a little bit on the fence here about whether we should do the necessary song-and-dance to burrow State's classLoaderCache to AnalzyingCompiler.

So I'll LGTM, and see what @eed3si9n says.

We need some notes here, both to flash how much better it performs and to document sbt.disable.interface.classloader.cache.

@retronym
Copy link
Member Author

Must have been Co-Snakes-and-ladders.

@retronym
Copy link
Member Author

I just tried this out on SBT itself. Steady state performance of compile (measured by running ;clean;update;, then compile repeatedly in the shell) improved a little 34s -> 31s.

@dwijnand
Copy link
Member

@retronym I removed the var and setter in dwijnand@b6a4789.

Have a look and see if you want to cherry-pick it.

@smarter
Copy link
Contributor

smarter commented Sep 26, 2016

The compiler interface subclasses scala.tools.nsc.Global,
and loading this new subclass before each compile task forces
HotSpot JIT to deoptimize larges swathes of compiled code

Interesting, do you mean that it's deoptimizing code unrelated to scala.tools.nsc.Global? Do you know why?
Also note to self: I should probably imitate that caching in https://github.com/lampepfl/dotty/blob/master/bridge/src/main/scala/xsbt/CompilerClassLoader.scala until I can port this to sbt proper.

@retronym
Copy link
Member Author

retronym commented Sep 26, 2016

@smarter Take a look at the diagnostic flags and output in scala/scala-dev#232 for the trail of clues.

In general, loading classes can deoptimize speculative optimizations based on Call Hierarachy Analysis. Similarly, loading and using a new Global subclass can invalidate devirtualized calls sites (a new reciever class appears which invalidates the assumption that this call site was effectively mono- or bi-morphic). I think that checkcast or instanceof checks could also also be affected.

Here's the JDK code that prints out the deoptimization messages: http://hg.openjdk.java.net/jdk8/jdk8/hotspot/file/tip/src/share/vm/runtime/deoptimization.cpp

@retronym
Copy link
Member Author

A theory for why the improvement in peak performance in the SBT's own build was only modest is that when multiple sub projects compiled in parallel the single copy of Global (loaded in a shared classloader) immediately was seen to have numerous active subclasses, so call sites didn't "overfit" to an assumption that would be soon invalidated. One could test this hypothesis by disabling parallel compilation of subprojects.

Copy link
Member

@eed3si9n eed3si9n left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall the change looks good, but I'm a bit concerned about how SoftReference behaves against several other uses of SoftReference, like in Analysis cache with many-subproject build.
If we're putting this in to 0.13.13, then RC should be bumped back.

@retronym
Copy link
Member Author

@eed3si9n I've run a test under this patch in which I constrained the heap and metaspace, and then compiled a project with a sequence of different Scala versions within a single SBT session.

https://gist.github.com/retronym/d621926d6f9ea9d3c3c992ac44fa87a7

@eed3si9n
Copy link
Member

@retronym Thanks for the additional testing.

@retronym
Copy link
Member Author

retronym commented Nov 8, 2016

I've just verified this again on scala/scala 2.12.x branch now that we've upgraded to SBT 0.13.13.

Below are the times to perform a clean compile of our sources as the JVM warms up, with the classloader caching in this PR disabled then enabled.

⚡ for b in true false; do echo "disabled=$b"; (((for i in {1..5}; do printf ';clean;update\ncompile\n'; done) | sbt -Dsbt.disable.interface.classloader.cache=$b) 2>&1) | egrep 'Total time: \d\d' ; done
disabled=true
[success] Total time: 131 s, completed 08/11/2016 11:47:50 AM
[success] Total time: 107 s, completed 08/11/2016 11:49:39 AM
[success] Total time: 105 s, completed 08/11/2016 11:51:25 AM
[success] Total time: 105 s, completed 08/11/2016 11:53:11 AM
[success] Total time: 105 s, completed 08/11/2016 11:54:57 AM
disabled=false
[success] Total time: 118 s, completed 08/11/2016 11:57:04 AM
[success] Total time: 88 s, completed 08/11/2016 11:58:32 AM
[success] Total time: 85 s, completed 08/11/2016 11:59:59 AM
[success] Total time: 86 s, completed 08/11/2016 12:01:26 PM
[success] Total time: 83 s, completed 08/11/2016 12:02:50 PM

@retronym
Copy link
Member Author

retronym commented Nov 8, 2016

I've repeated the experiment on our 2.11.x branch (with one local change to bump from SBT 0.13.12 to 0.13.12). Similar numbers:

⚡ for b in true false; do echo "disabled=$b"; (((for i in {1..4}; do printf ';clean;update\ncompile\n'; done) | sbt -Dsbt.disable.interface.classloader.cache=$b) 2>&1) | egrep 'Total time: \d\d' ; done
disabled=true
[success] Total time: 138 s, completed 08/11/2016 12:08:32 PM
[success] Total time: 106 s, completed 08/11/2016 12:10:20 PM
[success] Total time: 102 s, completed 08/11/2016 12:12:04 PM
[success] Total time: 102 s, completed 08/11/2016 12:13:48 PM
disabled=false
[success] Total time: 118 s, completed 08/11/2016 12:15:55 PM
[success] Total time: 86 s, completed 08/11/2016 12:17:23 PM
[success] Total time: 84 s, completed 08/11/2016 12:18:48 PM
[success] Total time: 82 s, completed 08/11/2016 12:20:12 PM

That's good news, I wasn't sure whether the gain would be the same using 2.11 or 2.12 as a compiler.

@eed3si9n
Copy link
Member

eed3si9n commented Nov 8, 2016

Nice

@retronym
Copy link
Member Author

retronym commented Nov 8, 2016

@eed3si9n Do you know if/how this problem relates to Maven/Gradle/IntelliJ using zinc without SBT?

@eed3si9n
Copy link
Member

eed3si9n commented Nov 8, 2016

Given that this setting is passed in via Defaults.scala, I would assume that none of the other build tools would be aware of this change unless we publish typesafehub/zinc and let them them know specifically for it.

@retronym
Copy link
Member Author

retronym commented Nov 8, 2016

@eed3si9n Okay, tracking that task as typesafehub/zinc#105

@stuhood
Copy link

stuhood commented Nov 8, 2016

https://github.com/pantsbuild/pants will happily pick this up assuming it is available in https://github.com/sbt/zinc, which we consume directly.

Thanks!

@pedrofurla
Copy link

Some interesting results from my one of the scala project at x.ai:

The code base has 209 files, 70k LoC (excluding blanks and comments).

There are 174 traits, 193 classes (93 are implicit classes), 514 case classes, 1029 objects.

There is a small abuse of Shapeless, that produces coproducts for most of our domain objects, when fully compiled it adds a signficant slowdown. The same can be said of our centralized serialization source.

On sbt 0.13.12

First try on a fresh sbt session, after ;clean; test:clean; update

sbt> test:compile
...
[success] Total time: 255 s, completed Nov 10, 2016 4:22:17 PM

Same SBT session, after another ;clean; test:clean; update

sbt> test:compile
...
[success] Total time: 245 s, completed Nov 10, 2016 7:50:02 PM

On SBT 0.13.13

First try on a fresh sbt session, after ;clean; test:clean; update

sbt> test:compile
...
[success] Total time: 228 s, completed Nov 10, 2016 4:01:28 PM

Same SBT session, after another ;clean; test:clean; update

sbt> test:compile
...
[success] Total time: 185 s, completed Nov 10, 2016 4:04:39 PM

@retronym
Copy link
Member Author

Thanks for sharing those numbers, @pedrofurla!

@adriaanm
Copy link
Contributor

Related PR in dotty's compiler bridge: scala/scala3#2262
Is this something to adopt in the Scala 2.x bridges as well?

@smarter
Copy link
Contributor

smarter commented Apr 14, 2017

@adriaanm I don't think there's anything that needs to be done in the scalac bridge, the dotty bridge needs to create its own classloader (because the sbt phases are part of the compiler and not part of the bridge, and the classloader constructed by sbt doesn't support that), but the scalac bridge just uses the classloader that should already be cached by sbt thanks to this PR.

@smarter
Copy link
Contributor

smarter commented Apr 14, 2017

(because the sbt phases are part of the compiler and not part of the bridge, and the classloader constructed by sbt doesn't support that)

For the record, see https://github.com/lampepfl/dotty/blob/9e45ad16d012e6a2ff3be411c2fe101b1c74b831/sbt-bridge/src/xsbt/CompilerClassLoader.scala#L45-L72 for the gory details.

@adriaanm
Copy link
Contributor

Ok, thanks for explaining!

smarter added a commit to smarter/mill that referenced this pull request Jul 27, 2018
So far, Mill was caching ScalaInstance which contains a classloader but
this is not enough: Zinc creates its own classloader by combining the
ScalaInstance classloader with the path to the compiler-bridge. Zinc
takes care of caching this classloader in each instance of
ScalaCompiler.

We can take advantage of this by caching an instance of Compilers since
it contains everything we want to cache (ScalaCompiler and
ScalaInstance).

This significantly reduces the amount of classloading that happens
at each compilation step, as measured by running:

export _JAVA_OPTIONS="-XX:+UnlockDiagnosticVMOptions -XX:+TraceClassLoading -XX:+TraceClassUnloading"
mill -i foo.compile

Then looking at the output of `out/mill-worker-1/logs` while adding a
new line in a source file in `foo` and running `mill -i foo.compile` again.

The speedup is going to depend on the project, but I measured a ~2x
improvement when running on the Mill build `time(core.compile())` and
`rm -rf out/core/compile/` in a loop until results stabilized. See also
the results that were obtained when a very similar issue was fixed in
sbt itself: sbt/sbt#2754
lihaoyi pushed a commit to com-lihaoyi/mill that referenced this pull request Jul 27, 2018
So far, Mill was caching ScalaInstance which contains a classloader but
this is not enough: Zinc creates its own classloader by combining the
ScalaInstance classloader with the path to the compiler-bridge. Zinc
takes care of caching this classloader in each instance of
ScalaCompiler.

We can take advantage of this by caching an instance of Compilers since
it contains everything we want to cache (ScalaCompiler and
ScalaInstance).

This significantly reduces the amount of classloading that happens
at each compilation step, as measured by running:

export _JAVA_OPTIONS="-XX:+UnlockDiagnosticVMOptions -XX:+TraceClassLoading -XX:+TraceClassUnloading"
mill -i foo.compile

Then looking at the output of `out/mill-worker-1/logs` while adding a
new line in a source file in `foo` and running `mill -i foo.compile` again.

The speedup is going to depend on the project, but I measured a ~2x
improvement when running on the Mill build `time(core.compile())` and
`rm -rf out/core/compile/` in a loop until results stabilized. See also
the results that were obtained when a very similar issue was fixed in
sbt itself: sbt/sbt#2754
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants