TechnologyApril 12, 2021

Tools for analyzing Java performance when you can’t modify your app

Matt Fleming
Matt FlemingPerformance Engineer
Tools for analyzing Java performance when you can’t modify your app

As developers, we’re used to modifying the code we’re writing to make debugging and troubleshooting easier. But sometimes that’s simply not an option. Whether we’re working with a customer cluster that has limited internet access or debugging an issue in production, sometimes it’s just not possible to make major changes to the software such as deploying a new version with debug code.

In those situations, it’s essential to have a grab bag full of tools that you can run on an existing version of your code. It’s rare for one tool to cover all the bases so in this article we’ll look at a mixture of performance tracing and profiling tools.

To illustrate how all of these tools work, we’ll need an example application. Here’s a small Java app named Spinner that simply creates a new list and adds one string to it in a loop:

Java
import java.util.List;
import java.util.ArrayList;

public class Spinner {
    public static void main(String[] args) {
        while (true) {
            List strings = new ArrayList();
            strings.add("Foobar");
        }
    }
}

This is clearly a toy example, but we can use it in place of a CPU-bound app, i.e. one that mainly consumes CPU and doesn’t do I/O.

BTrace

Btrace is an often overlooked tool for understanding how Java applications behave. It works by compiling and dynamically inserting Java code, known as tracing code, into your running application. BTrace enables you to hook into execution paths of your app as well as any part of the Java runtime. This means, for example, that you can run your tracing code on entry to a specific method of a class no matter which package the class lives in.

The documentation for BTrace gives a solid overview of how to write your own plugins, but to start with let’s see how we can write our own tracing code to print a message every time the add method is called from the List class in the Java Development Kit (JDK).

Java
import org.openjdk.btrace.core.annotations.BTrace;
import org.openjdk.btrace.core.annotations.OnMethod;

import static org.openjdk.btrace.core.BTraceUtils.*;

@BTrace
public class SpinnerTrace {
    @OnMethod(clazz="java.util.List", method="add")
    public static void onAdd() {
        println("Adding item");
    }
}

All of the annotations are described more clearly in the BTrace documentation, but it’s worth highlighting the main one we’re using here: OnMethod. The OnMethod annotation allows us to inject our own code (in this case, the onAdd method) into the app we’re tracing.

If List.add() is called frequently, this simple tracing script might emit too much information to be valuable. Fortunately, we can batch our messages and print aggregated data using the OnTimer annotation, which executes a method every N milliseconds:

Java
import org.openjdk.btrace.core.annotations.BTrace;
import org.openjdk.btrace.core.annotations.OnMethod;
import org.openjdk.btrace.core.annotations.OnTimer;
Java
import static org.openjdk.btrace.core.BTraceUtils.*;

import java.util.concurrent.atomic.AtomicInteger;

@BTrace
public class SpinnerTrace {
    private static AtomicInteger count = Atomic.newAtomicInteger(0);

    @OnMethod(clazz="java.util.List", method="add")
    public static void onAdd() {
        Atomic.incrementAndGet(count);
    }

    // Print stats every 5 seconds
    @OnTimer(5000)
    public static void timer() {
        println("Called List.add " + Atomic.getAndSet(count, 0) + " times");
    }
}

async-profiler

Sometimes, it’s not enough to grab a snapshot of the state of your Java application, and instead what you really need is an aggregation of the behaviour over a period of time. For this scenario, a profiler is the tool of choice and async-profiler is a popular open-source option. async-profiler is designed to be low-overhead and suitable for running in places where you don’t want to perturb running apps and services, such as production systems.

Unlike some other Java profilers, async-profiler touts that it doesn’t suffer from the Safepoint bias problem: an issue where portions of Java code are simply cannot be profiled and effectively become blindspots. Under the hood, async-profiler uses a mixture of performance events and HotSpot-specific APIs to measure things like CPU cycles, cache hits/misses, and Java heap memory allocations.

For example, if you’ve got a Java application like our Spinner app that is consuming a lot of CPU but you’re not exactly sure why that is and what code paths are being executed, you can run async-profiler like so:

Bash
$ ./profiler.sh -d 30 

...
          ns  percent  samples  top
  ----------  -------  -------  ---
 22590762415   75.22%     2247  Spinner.main
  7030293850   23.41%      700  java.util.Arrays.copyOf
   373412742    1.24%       32  /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server/libjvm.so
    19999296    0.07%        2  _new_array_Java
    10056193    0.03%        1  _IO_default_xsputn
    10020742    0.03%        1  __pthread_cond_signal
      

This runs the profiler for a 30-second duration and displays a breakdown of the where time is being spent in your app (I’ve omitted the callstacks for brevity). Apart from spinning inside the main method of the Spinner app, we can also see that 24.41% of the time is being spent in java.util.Arrays.copyOf, which if you refer to the code at the start of this article you’ll note is related to the line strings.add("Foobar");

One of the added benefits of async-profiler is that it can generate flamegraphs, a visualization based on stack trace data that helps you quickly identify where an app is spending its time. Here’s how to generate a flamegraph for our Spinner app:

Bash
$ ./profiler.sh -d 30 -f flamegraph.html

And the output looks like this:

 

Quite often, looking at a flamegraph lets you figure out what your app is doing extremely quickly because of the intuitive visualization. The above image shows that the vast majority of the time is being spent copying elements to an array, just like we suspected above. The flamegraph uses the exact same data as the command-line output from the previous step but provides a great way to understand the details at a glance.

jstack

jstack might seem like an odd choice for this list because it’s a standard tool that is shipped with the JDK. Still, it’s this ubiquity that makes it such an awesome tool— frequently, when you’re working on a customer system with restricted internet access, jstack is one of the only tools that you know for sure is going to be available.

jstack prints Java stack traces for a running Java app. You can even run it against core files and remote servers. Since jstack is a command-line tool, it’s easy to post-process the stack traces using standard Linux commands.

Let’s take our Spinner app and see what the output of jstack looks like:

Bash
$ jstack 

"C1 CompilerThread1" #6 daemon prio=9 os_prio=0 tid=0x00007f39180b9000 nid=0x3830 waiting on condition [0x0000000000000000]

   java.lang.Thread.State: RUNNABLE

"C2 CompilerThread0" #5 daemon prio=9 os_prio=0 tid=0x00007f39180b6000 nid=0x382f waiting on condition
Bash
[0x0000000000000000]

   java.lang.Thread.State: RUNNABLE

"Signal Dispatcher" #4 daemon prio=9 os_prio=0 tid=0x00007f39180b3000 nid=0x382e runnable [0x0000000000000000]

   java.lang.Thread.State: RUNNABLE

"main" #1 prio=5 os_prio=0 tid=0x00007f391800a800 nid=0x3828 runnable [0x00007f3921a1a000]

   java.lang.Thread.State: RUNNABLE

at Spinner.main(Spinner.java:9)

"VM Thread" os_prio=0 tid=0x00007f3918074800 nid=0x382b runnable 

"GC task thread#0 (ParallelGC)" os_prio=0 tid=0x00007f3918020000 nid=0x3829 runnable 

"GC task thread#1 (ParallelGC)" os_prio=0 tid=0x00007f3918021800 nid=0x382a runnable 

"VM Periodic Task Thread" os_prio=0 tid=0x00007f39180be8

Again, I’ve omitted a lot of the details in the name of saving some space but we can clearly see that the jstack output provides a wealth of information. We get the a full callstack for each thread along with details of threads that are internal to the JRE such as garbage collection threads (GC task thread#N) and compiler threads (C1 CompilerThreadN).

This holistic perspective often serves as a great jumping off point for further investigations. For example, we can take the nid (native identifier) info, which is the hexadecimal representation of the OS thread ID, and use it with OS-level tools such as pidstat to get even more details at a lower level.

Conclusion

Restrictions and limitations help to bring out the creative side of every developer, and when you’re troubleshooting software on a locked-down system you need to bring a full bag of tricks to solve those hard problems. The tools covered in this article give you a wide range of options for tackling even the most difficult tasks when it’s simply not possible to drop a debug statement into your code.

async-profiler allows you to quickly and easily understand where you app is spending its time by looking at flamegraph visualizations. BTrace enables you to dynamically insert code and analyze the flow of execution through your app while it’s running, which is invaluable for large, complex software, or even smaller software that you’re unfamiliar with.


And finally, jstack is a tried and tested staple of any tool chest. It’s available pretty much anywhere the JRE is and can be invaluable for getting to grips with what your application is doing in a pinch. It also includes some valuable details that let you bridge the Java runtime with the view of your operating system: things like native identifier (nid) values, which help you use standard OS-level tools to further inspect your Java application.

Share

One-stop Data API for Production GenAI

Astra DB gives JavaScript developers a complete data API and out-of-the-box integrations that make it easier to build production RAG apps with high relevancy and low latency.