String deduplication is an important feature aimed at optimizing memory usage by eliminating duplicate strings from heap memory. However, recent observations suggest a concerning trend – a degradation in string deduplication performance across newer Java versions. Thus, we embarked on a comparative analysis to assess the string deduplication performance behavior in Java versions 11, 17, and 21. This post intends to share our observations and insights gleaned from this analysis.

WebCrawler Spring Boot Application
In order to experiment with string deduplication, we used an open source Web Crawler application developed in Spring Boot. The web crawler is a REST based application, which will crawl any given website and archive that site’s information into the H2 Database. When a HTTP POST request is made, WebCrawler will start the crawling job in the site and return a jobId. This jobId can be used later to query the status of the crawling task.

To crawl the wikipedia website, you need to pass the seed as https://en.wikipedia.org/wiki with a depth. The depth will decide how deep it should crawl the websites and extract the information. For example, the http://localhost:8003/start?depth=10...pedia.org/wiki
URL, will crawl the wikipedia site with depth as 100 in the background.

Once you make a POST request, it will return the jobId as a response and the crawling will start in the background. Once the crawling task reaches the depth level, automatically will stop that particular crawling task.

We are going to load test this web crawler with a POST request in Java 11, 17 and 21 and study the String deduplication performance.

Enabling String Deduplication
To study the behavior of String Deduplication, it must first be enabled in the JVM. This can be achieved by passing the following JVM argument:

-XX:+UseStringDeduplication

String Deduplication events performance characteristics (like how many string examined, how many of them were deduplicated, how long it took to complete, …) can be printed in to the log file by passing following JVM arguments to the application:

-Xlog:stringdedup*=debug:file=string-dup-logfile.log  -Xloggc:string-dup-logfile.log

Based on the above JVM arguments, we are directing the both string deduplication debug events and the GC events log messages into a one single ‘string-dup-logfile.log’ file

For detailed information about the JVM arguments, refer to this article for String deduplication


JMeter Load Test
We conducted load testing on the WebCrawler application using JMeter, simulating a load of 50 users submitting crawling tasks with identical seed URLs and depths for approximately 1 hour. We are submitting the same URL, so that a lot of duplicate strings will be simulated in the application.

Note this test was conducted in my local laptop, whose configuration is:

Operating System: Windows 11
System type: 64-bit operating system, x64-based processor
RAM: 12 GB
Processor: Intel(R) Core(TM) i5-1035G1 CPU @ 1.00GHz 1.19 GHz
Java Heap Size (i.e. -Xm512mb)

Below image shows the JMeter configuration in the above case:


Fig: POST method configuration in JMeter

Garbage Collection Analysis Study
After completing the tests, we uploaded the generated Garbage Collection log file to the online GCeasy tool for analysis. The tool promptly generated reports showcasing String Deduplication performance metrics for each Java version tested. Here are the reports generated by the tool:

Java 11 GC report: https://rb.gy/8o440d

Java 17 GC report: https://rb.gy/2hoz1p

Java 21 GC report: https://rb.gy/8n74lu

Comparison Study of String Deduplication Key Metrics Across Java Versions

Below table summarizes String deduplication key metrics across the Java versions:




The analytics reveal that Java 11 exhibits the best String Deduplication performance, eliminating 34.3% of strings. Conversely, Java 17 and 21 only managed to eliminate 9.3% and 3.4%, respectively. Moreover, the time taken to deduplicate strings increased in modern Java versions, with Java 11 completing the process in only 1,264.442 ms, compared to 2,323.492 ms for Java 17 and 3,439.466 ms for Java 21.

In essence, modern JVMs are spending more time inspecting a higher number of strings while eliminating fewer duplicates from memory. This underscores a clear degradation in JVM performance.

Conclusion
The investigation into string deduplication performance across Java versions 11, 17, and 21 yields significant insights into the evolution of JVM behavior. While string deduplication serves as a vital mechanism for optimizing memory utilization, our analysis reveals a concerning trend of degradation in performance across newer Java releases. Java 11 emerges as the standout performer, efficiently eliminating a substantial portion of duplicate strings within a shorter time frame. In contrast, Java 17 and 21 exhibit diminished effectiveness, both in terms of the percentage of strings deduplicated and the time taken to execute the process.