NOTE: This blog post no longer contains the most up-to-date documentation on the optimizer. Please visit the optimizer page on the Scala documentation site.
-opt:l:method
. This option is safe for binary compatibility, but typically doesn’t improve performance on its own.-opt:l:inline
and -opt-inline-from:[PATTERN]
-opt-inline-from:my.package.**
to only inline from packages within your library.-opt-inline-from:**
), ensure that the run-time classpath is exactly the same as the compile-time classpath.@inline
annotation only has an effect if the inliner is enabled. It tells the inliner to always try to inline the annotated method or callsite.@inline
annotation, the inliner generally inlines higher-order methods and forwarder methods. The main goal is to eliminate megamorphic callsites due to functions passed as argument, and to eliminate value boxing. Other optimizations are delegated to the JVM.To learn more, read on.
The Scala compiler has included an inliner since version 2.0. Closure elimination and dead code elimination were added in 2.1. That was the first Scala optimizer, written and maintained by Iulian Dragos. He continued to improve these features over time and consolidated them under the -optimise
flag (later Americanized to -optimize
), which remained available through Scala 2.11.
The optimizer was re-written for Scala 2.12 to become more reliable and powerful – and to side-step the spelling issue by calling the new flag -opt
. This post describes how to use the optimizer in Scala 2.12 and 2.13: what it does, how it works, and what are its limitations.
Why does the Scala compiler even have a JVM bytecode optimizer? The JVM is a highly optimized runtime with a just-in-time (JIT) compiler with 19 years of tuning. It’s because there are certain well-known code patterns that the JVM fails to optimize properly. These patterns are common in functional languages such as Scala. (Increasingly, Java code with lambdas is catching up and showing the same performance issues at run-time.)
The two most important such patterns are “megamorphic dispatch” (also called “the inlining problem”) and value boxing. If you’d like to learn more about these problems in the context of Scala, you could watch the part of my Scala Days 2015 talk (starting at 26:13).
The goal of the Scala optimizer is to produce bytecode that the JVM can execute fast. It is also a goal to avoid performing any optimizations that the JVM can already do well.
This means that the Scala optimizer may become obsolete in the future, if the JIT compiler is improved to handle these patterns better. In fact, with the arrival of GraalVM, that future might be nearer than you think! We take a closer look at Graal in a follow-up post. But for now, we dive into some details about the Scala optimizer.
The Scala optimizer has to make its improvements within fairly narrow constraints:
object
s, and methods where the receiver’s type is precisely known (for example, in (new A).f
, the receiver is known to be exactly A
, not a subtype of A
).However, even when staying within these constraints, some changes performed by the optimizer can be observed at run-time:
Inlined methods disappear from call stacks.
This can lead to unexpected behaviors when using a debugger.
Inlining a method can delay class loading of the class where the method is defined.
The optimizer assumes that modules (singletons like object O
) are never null
.
NullPointerException
when compiled normally, but prints 0
when compiled with the optimizer enabled:
class A {
println(Test.f)
}
object Test extends A {
@inline def f = 0
def main(args: Array[String]): Unit = ()
}
This assumption can be disabled with -opt:-assume-modules-non-null
, which results in additional null checks in optimized code.
The optimizer removes unnecessary loads of certain built-in modules, for example scala.Predef
and scala.runtime.ScalaRunTime
. This means that initialization (construction) of these modules can be skipped or delayed.
For example, in def f = 1 -> ""
, the method Predef.->
is inlined and the access to Predef
is eliminated. The resulting code is def f = new Tuple2(1, "")
.
-opt:-allow-skip-core-module-init
The optimizer eliminates unused C.getClass
calls, which may delay class loading. This can be disabled with -opt:-allow-skip-class-loading
.
Scala minor releases are binary compatible with each other, for example 2.12.6 and 2.12.7. The same is true for many libraries in the Scala ecosystem. These binary compatibility promises are the main reason for the Scala optimizer not to be enabled everywhere.
The reason is that inlining a method from one class into another changes the (binary) interface that is accessed:
class C {
private[this] var x = 0
@inline final def inc(): Int = { x += 1; x }
}
When inlining a callsite c.inc()
, the resulting code no longer calls inc
, but instead accesses the field x
directly. Since that field is private (also in bytecode), inlining inc
is only allowed within the class C
itself. Trying to access x
from any other class would cause an IllegalAccessError
at run-time.
However, there are many cases where implementation details in Scala source code become public in bytecode:
class C {
private def x = 0
@inline final def m: Int = x
}
object C {
def t(c: C) = c.x
}
Scala allows accessing the private method x
in the companion object C
. In bytecode, however, the classfile for the companion C$
is not allowed to access a private method of C
. For that reason, the Scala compiler “mangles” the name of x
to C$$x
and makes the method public.
This means that m
can be inlined into classes other than C
, since the resulting code invokes C.C$$x
instead of C.m
. Unfortunately this breaks Scala’s binary compatibility promise: the fact that the public method m
calls a private method x
is considered to be an implementation detail that can change in a minor release of the library defining C
.
Even more trivially, assume that method m
was buggy and is changed to def m = if (fullMoon) 1 else x
in a minor release. Normally, it would be enough for a user to put the new version on the classpath. However, if the old version of c.m
was inlined at compile-time, having the new version of C on the run-time classpath would not fix the bug.
In order to safely use the Scala optimizer, users need to make sure that the compile-time and run-time classpaths are identical. This has a far-reaching consequence for library developers: libraries that are published to be consumed by other projects should not inline code from the classpath. The inliner can be configured to inline code from the library itself using -opt-inline-from:my.package.**
.
The reason for this restriction is that dependency management tools like sbt will often pick newer versions of transitive dependencies. For example, if library A
depends on core-1.1.1
, B
depends on core-1.1.2
and the application depends on both A
and B
, the build tool will put core-1.1.2
on the classpath. If code from core-1.1.1
was inlined into A
at compile-time, it might break at run-time due to a binary incompatibility.
The compiler flag for enabling the optimizer is -opt
. Running scalac -opt:help
shows how to use the flag.
By default (without any compiler flags, or with -opt:l:default
), the Scala compiler eliminates unreachable code, but does not run any other optimizations.
-opt:l:method
enables all method-local optimizations, for example:
isInstanceOf
checks whose result is known at compile-timejava.lang.Integer
or scala.runtime.DoubleRef
that are created within a method and don’t escape itIndividual optimizations can be disabled. For example, -opt:l:method,-nullness-tracking
disables nullness optimizations.
Method-local optimizations alone typically don’t have any positive effect on performance, because source code usually doesn’t have unnecessary boxing or null checks. However, local optimizations can often be applied after inlining, so it’s really the combination of inlining and local optimizations that can improve program performance.
-opt:l:inline
enables inlining in addition to method-local optimizations. However, to avoid unexpected binary compatibility issues, we also need to tell the compiler which code it is allowed to inline. This is done with the -opt-inline-from
compiler flag. Examples:
-opt-inline-from:my.library.**
enables inlining from any class defined in package my.library
, or in any of its sub-packages. Inlining within a library is safe for binary compatibility, so the resulting binary can be published. It will still work correctly even if one of its dependencies is updated to a newer minor version in the run-time classpath.-opt-inline-from:<sources>
enables inlining from the set of source files being compiled in the current compiler invocation. This option can also be used for compiling libraries. If the source files of a library are split up across multiple sbt projects, inlining is only done within each project. Note that in an incremental compilation, inlining would only happen within the sources being re-compiled – but in any case, it is recommended to only enable the optimizer in CI and release builds (and to run clean
before building).-opt-inline-from:**
allows inlining from every class, including the JDK. This option enables full optimization when compiling an application. To avoid binary incompatibilities, it is mandatory to ensure that the run-time classpath is identical to the compile-time classpath, including the Java standard library.Running scalac -opt-inline-from:help
explains how to use the compiler flag.
@inline
When the inliner is enabled, it automatically selects callsites for inlining according to a heuristic.
As mentioned in the introduction, the main goal of the Scala optimizer is to eliminate megamorphic dispatch and value boxing. In order to keep this post from growing too long, a followup post will include the analysis of concrete examples that motivate which callsites are selected by the inliner heuristic.
Nevertheless, it is useful to have an intuition of how the heuristic works, so here is an overview:
@noinline
are not inlined.@inline
are inlined.IntRef
/ DoubleRef
/ … parameter are inlined. When nested methods update variables of the outer method, those variables are boxed into XRef
objects. These boxes can often be eliminated after inlining the nested method._ + 1
and synthetic methods (potentially with boxing / unboxing adaptations) such as bridges.To prevent methods from exceeding the JVM’s method size limit, the inliner has size limits. Inlining into a method stops when the number of instructions exceeds a certain threshold.
As you can see in the list above, the @inline
and @noinline
annotations are the only way for programmers to influence inlining decisions. In general, our recommendation is to avoid using these annotations. If you observe issues with the inliner heuristic that can be fixed by annotating methods, we are very keen to hear about them, for example in the form of a bug report.
A related anecdote: in the Scala compiler and standard library (which are built with the optimizer enabled), there are roughly 330 @inline
-annotated methods. Removing all of these annotations and re-building the project has no effect on the compiler’s performance. So the annotations are well-intended and benign, but in reality unnecessary.
For expert users, @inline
annotations can be used to hand-tune performance critical code without reducing abstraction. If you have a project that falls into this category, please let us know, we’re interested to learn more!
Finally, note that the @inline
annotation only has an effect when the inliner is enabled, which is not the case by default. The reason is to avoid introducing accidental binary incompatibilities, as explained above.
The inliner can issue warnings when callsites cannot be inlined. By default, these warnings are not issued individually, but only as a summary at the end of compilation (similar to deprecation warnings).
$> scalac Test.scala -opt:l:inline '-opt-inline-from:**'
warning: there was one inliner warning; re-run enabling -opt-warnings for details, or try -help
one warning found
$> scalac Test.scala -opt:l:inline '-opt-inline-from:**' -opt-warnings
Test.scala:3: warning: C::f()I is annotated @inline but could not be inlined:
The method is not final and may be overridden.
def t = f
^
one warning found
By default, the inliner issues warnings for invocations of methods annotated @inline
that cannot be inlined. Here is the source code that was compiled in the commands above:
class C {
@inline def f = 1
def t = f // cannot inline: C.f is not final
}
object T extends C {
override def t = f // can inline: T.f is final
}
The -opt-warnings
flag has more configurations. With -opt-warnings:_
, a warning is issued for every callsite that is selected by the heuristic but cannot be inlined. See also -opt-warnings:help
.
If you’re curious (or maybe even skeptical) about what the inliner is doing to your code, you can use the -Yopt-log-inline
flag to produce a trace of the inliner’s work:
package my.project
class C {
def f(a: Array[Int]) = a.map(_ + 1)
}
$> scalac Test.scala -opt:l:inline '-opt-inline-from:**' -Yopt-log-inline my/project/C.f
Inlining into my/project/C.f
inlined scala/Predef$.intArrayOps (the callee is annotated `@inline`). Before: 15 ins, after: 30 ins.
inlined scala/collection/ArrayOps$.map$extension (the callee is a higher-order method, the argument for parameter (evidence$6: Function1) is a function literal). Before: 30 ins, after: 94 ins.
inlined scala/runtime/ScalaRunTime$.array_length (the callee is annotated `@inline`). Before: 94 ins, after: 110 ins.
[...]
rewrote invocations of closure allocated in my/project/C.f with body $anonfun$f$1: INVOKEINTERFACE scala/Function1.apply (Ljava/lang/Object;)Ljava/lang/Object; (itf)
inlined my/project/C.$anonfun$f$1 (the callee is a synthetic forwarder method). Before: 654 ins, after: 666 ins.
inlined scala/runtime/BoxesRunTime.boxToInteger (the callee is a forwarder method with boxing adaptation). Before: 666 ins, after: 674 ins.
Explaining the details here is out of scope for this post. We defer this discussion to a follow-up post that will explain the internals of the Scala optimizer in more detail.
The goal of this article was to explain why the Scala optimizer exists and give a rough explanation what it can and cannot do. It also showed how to configure and use the optimizer in your project.
In the next post, we will go into detail about how the optimizer works, what transformations are applied, and how they work together. We will also measure performance improvements that the optimizer can bring. Finally, we will look at related projects, dive a little more into the history of the optimizer, and discuss ideas for the future.