How to profile JVM applications
Hi everyone. One of the Tooling team focus lately has been improvement of the contribution process to sbt. Another thing we’ve been thinking is the performance of sbt. Combining these themes together, I’ve been looking into how guys like Jason Zaugg and Johannes Rudolph profile JVM applications, and this post is my findings.
The techniques described here should be applicable to both Java and Scala, and mostly indepent of build tools you use.
Flame graphs (async-profiler)
There are several ways to profile JVM apps, but the new hotness in profiling is Flame graphs invented by Brendan Gregg, Senior Performance Architect at Netflix. You first collect stack trace samples, and then it is processed into an intereactive svg graph. For an introduction to Flame graphs, see:
- Using FlameGraphs To Illuminate The JVM by Nitsan Wakart
- USENIX ATC ’17: Visualizing Performance with Flame Graphs
The first tool I recommend is async-profiler by Andrei Pangin. This is available for macOS and Linux, and it’s easier to get started.
- Download the installer from async-profiler 1.2.
- Make symbolic link to
$HOME/bin, assuming you have PATH to
ln -s ~/App/async-profiler/profiler.sh $HOME/bin/profiler.sh ln -s ~/App/async-profiler/build $HOME/bin/build
Next, close all Java appliations and anything that may affect the profiling, like Slack, and run your app in a terminal. In my case, I am trying to profile
sbt’s initial loading:
$ sbt exit
In another terminal, run:
$ jps 92746 sbt-launch.jar 92780 Jps
This tells you the process ID of the app. In this case, it’s
92746. While it’s running, run
$ profiler.sh -d 60 <process id> Started [cpu] profiling --- Execution profile --- Total samples: 31602 Non-Java: 3239 (10.25%) GC active: 46 (0.15%) Unknown (native): 14667 (46.41%) Not walkable (native): 3 (0.01%) Unknown (Java): 433 (1.37%) Not walkable (Java): 8 (0.03%) Thread exit: 1 (0.00%) Deopt: 9 (0.03%) Frame buffer usage: 55.658% Total: 1932000000 (6.11%) samples: 1932 [ 0] java.lang.ClassLoader$NativeLibrary.load [ 1] java.lang.ClassLoader.loadLibrary0 [ 2] java.lang.ClassLoader.loadLibrary [ 3] java.lang.Runtime.loadLibrary0 [ 4] java.lang.System.loadLibrary ....
This should show a bunch of stacktraces that are useful. To visualize this as a flamegraph, run:
$ profiler.sh -d 60 -f /tmp/flamegraph.svg <process id>
This should produce
/tmp/flamegraph.svg at the end.
See flamegraph.svg to try the output yourself.
Flame graphs (perf-map-agent)
Even though async-profiler is easier to get started, the fun part of Flame graph is mixing the JVM stack trace with the native code’s stack trace, allowing you see what your program is actually spending its CPU on. It turns out that Lightbend’s Johannes Rudolph wrote a tool for this called perf-map-agent. This uses
dtrace on macOS and
perf on Linux. This is particularly useful if you are trying to find if the bottle neck is on the native code.
You first have to compile perf-map-agent. For macOS, here to how to export
JAVA_HOME before running
$ cd work $ git clone https://github.com/brendangregg/FlameGraph.git $ git clone https://github.com/jvm-profiling-tools/perf-map-agent.git $ cd perf-map-agent $ export JAVA_HOME=$(/usr/libexec/java_home) $ cmake . -- The C compiler identification is AppleClang 184.108.40.20600039 -- The CXX compiler identification is AppleClang 220.127.116.1100039 ... $ make
In a fresh termimal, run sbt with
$ sbt -J-Dsbt.launcher.standby=20s -J-XX:+PreserveFramePointer exit
In the terminal that you will run the perf-map:
$ cd quicktest/ $ export JAVA_HOME=$(/usr/libexec/java_home) $ export FLAMEGRAPH_DIR=$HOME/work/FlameGraph $ jps 94592 Jps 94549 sbt-launch.jar $ $HOME/work/perf-map-agent/bin/dtrace-java-flames 94549 dtrace: system integrity protection is on, some features will not be available dtrace: description 'profile-99 ' matched 2 probes Flame graph SVG written to DTRACE_FLAME_OUTPUT='/Users/xxx/work/quicktest/flamegraph-94549.svg'.
This would produce better flamegraph in theory, but the output looks too messy for
sbt exit case.
This might work better if the operations are already JITed, or if the operation is more specific. To get a better Flame graph, one thing we can do now is to repeat the same operation multiple times.
$ sbt -J-Dsbt.launcher.standby=20s -J-XX:+PreserveFramePointer reload reload reload reload exit
This enhances the signal, and some plateaus become more pronouced that we can zoom in and find out the hot paths.
Netflix recently released a new visualization tool called Flamescope that can filter the Flame graph to a specific range of time.
This was developed by Martin Spier and Brendan Gregg to study perturbations and other time-based issues. This makes sense because the normal Flame graphs aggregates all stack trace samples, so if there’s a short-lived glitch, it could be burried in other traces.
Due to JIT warmup etc, benchmarking is difficult. JMH runs the same tests multiple times to remove these effects and comes closer to measuring the performance of your code.
For sbt users, sbt-jmh that Lightbend’s Konrad Malawski wrote makes JMH testing easier. It apparently adds an integration with async-profiler too.
I’d also mention traditional JVM profiling tool. Since VisualVM is opensource, I’ll mention that one.
- First open VisualVM.
- Start sbt from a terminal.
- You should see
- Open it, and select either sampler or profiler, and hit CPU button at the point when you want to start.
If you are familiar with YourKit, it also works similarly.
Flame graph avisualizes stack trace samples, which helps in identifying the hot paths in your application. It also helps to confirm if the changes you made has actually improved the performance or not.