In this post, we are going to discuss a non-intrusive approach (i.e., approach that doesn’t add any noticeable overhead to the application) to diagnose CPU spike. Thus, you can use this approach in your production environment to troubleshoot CPU spikes.

Works on all JVM languages:
This approach can be used to troubleshoot CPU spikes in all programming languages that run on Java Virtual Machine (JVM) like Java, Scala, Kotlin, JRuby, Jython,..

Step 1: Capture 360° data
You can use the open source yCrash data script to capture 360° data from your application stack. This script basically captures 16 different artifacts from your application stack (GC Log, thread dump, heap substitute, netstat, iostat, ….) and runs less than 30 seconds. Thus it doesn’t add any measurable overhead to your application. You can trigger this script from any platform (all Linux flavors, windows, ..) and any environment (bare metal, cloud, containers, k8…).



Fig: 360-degree data

Here are the steps to run this script:

1. Download the latest yc-data-script from this location

2. Unzip the downloaded yc-agent-latest.zip file. (Say you are unzipping in ‘/opt/workspace/yc-agent-latest’ folder)

3. In the unzipped folder you will find yc-data-script by operating system:

a) linux/yc – If you are running on Unix/Linux, then use this script.

b) windows/yc.exe – If you are running on Windows, then use this script.

c) mac/yc – If you are running on MAC, then use this script.

4. You can execute the yc script by issuing following command:

./yc -j {JAVA_HOME} -onlyCapture -p {PID}
Where,

JAVA_HOME is the home directory where JDK is installed

PID is the target JVM’s process ID

Example:

./yc -j /usr/java/jdk1.8.0_141 -onlyCapture -p 15326
When you pass the above arguments, yc-data-script will capture all the application level and system level artifacts/logs from your application stack for analysis. Captured artifacts will be compressed into a zip file and stored in the current directory where the above command was executed. The zip file will have the name in the format: ‘yc-YYYY-MM-DDTHH-mm-ss.zip‘. Example: ‘yc-2021-03-06T14-02-42.zip‘.

2. Analyze captured data
Once you have captured the data, you can analyze them using the upload the captured zip file to the yCrash server for analysis. yCrash server analyzes all the captured data and generates one unified root cause analysis report instantly. Note: There is a free tier in yCrash application which you can use for the CPU diagnosis purposes. In the yCrash incident report you will see a ‘CPU consumption by thread’ section under the ‘Thread’ report (as shown below):



Fig: CPU consumption by threads reported by yCrash

This section will show all the CPU consuming threads and the exact lines of code they are working on. Equipped with this information, you can spot the ‘black sheep’ lines of code that are causing the CPU to spike up.

How does it work? 
‘Thread dump’ and ‘top -H -p {PROCESS_ID}’ are the two artifacts that yCrash data script captures. Here ‘top -H -p {PROCESS_ID}’ command shows the list of thread Ids and the amount of CPU, Memory it consumes within the specified PROCESS_ID. ‘Thread dump’ shows the code path in which threads are executing. yCrash tool marries these two data and produces the above report. For more details refer to this post.

I hope this approach will facilitate you isolate CPU consuming lines of code effectively. Happy Troubleshooting!!