ZGC is a specialized garbage collector that focuses on managing large heaps and minimizing pauses in Java applications. It tackles the challenges of garbage collection in scenarios where memory-intensive workloads and consistent response times are vital. Leveraging concurrent processing capabilities and advanced algorithms, ZGC offers an effective solution for optimizing performance in modern Java applications. In this post, we will explore techniques to tune ZGC for enhanced performance specifically. However, if you want to learn more basics of Garbage Collection tuning, you may watch this JAX London conference talk.

ZGC Tuning Parameters
ZGC, a garbage collector in Java, takes a different approach to tuning by minimizing the number of exposed JVM parameters. Unlike traditional garbage collectors that require fine-grained adjustments, ZGC focuses on optimizing the management of large heap sizes while providing efficient garbage collection with minimal configuration overhead. This streamlined approach allows developers to primarily focus on one key JVM parameter for tuning: the heap size.

1. Heap Size (-Xmx<size>)
The ‘Heap Size’ parameter is a crucial tuning option for ZGC. It determines the maximum amount of memory allocated for the Java heap, which is where objects are stored in memory during the execution of your Java application.

When configuring the heap size for ZGC, there are a few factors to consider. First, you need to ensure that the heap can accommodate the live-set of your application, which includes all the objects actively used during runtime. Allocating too small of a heap size may lead to frequent garbage collections and increased pause times, as ZGC will need to run more frequently to reclaim memory.

On the other hand, allocating too large of a heap size can lead to wasted memory resources. It’s important to strike a balance between memory usage and the frequency of garbage collection. The specific optimal heap size will depend on factors such as the memory requirements of your application, the size of the live-set, and the overall memory availability on your system.

To specify the heap size, use the ‘-Xmx<size>‘ flag when launching your Java application, where ‘<size>’ represents the desired heap size. For example, ‘-Xmx32g’ sets the maximum heap size to 32 gigabytes.

2. Concurrent GC Threads (-XX:ConcGCThreads=<number>)
Another interesting tuning option to consider is the number of concurrent garbage collection (GC) threads in ZGC, which can be configured using the ‘-XX:ConcGCThreads=<number>‘ flag. ZGC has built-in heuristics to automatically select the optimal number of threads based on the characteristics of your application. The default heuristic in ZGC usually works well for most scenarios. However, depending on the specific behavior and requirements of your application, you may need to adjust the number of concurrent GC threads. This parameter determines how much CPU time is allocated to the garbage collector. Allocating too many threads can result in excessive CPU usage by the GC, taking away valuable resources from your application. On the other hand, allocating too few threads may slow down the GC performance.

Starting from JDK 17, ZGC introduced dynamic scaling of the number of concurrent GC threads. This means that ZGC can automatically adjust the number of threads based on the workload, making it less likely for you to manually tune this parameter.

3. Enabling Large Pages (-XX:+UseLargePages)
Configuring ZGC to utilize large pages can enhance throughput, reduce latency, and improve startup time. Large pages, also known as huge pages, have a size of 2MB on Linux/x86 systems. Large pages are memory pages that are larger than the standard page size. They offer benefits such as reduced memory management overhead and improved memory access efficiency.

To enable large pages in ZGC, you need to configure the ‘-XX:+UseLargePages‘ option in the JVM.

Note: Enabling large pages requires certain configurations to be done at the operating system level. These configurations, such as assigning memory to the pool of large pages and setting up hugetlbfs filesystem, are outside the scope of this post.

4. Enabling Transparent (-XX:+UseTransparentHugePages)
An alternative to using explicit large pages (as described above) is to use Transparent Huge Pages (THP). THP is a feature in the Linux kernel that automatically aggregates standard memory pages into larger, more efficient huge pages. THP aims to improve memory management by reducing the overhead associated with managing individual pages. By grouping multiple standard pages into a single huge page (typically 2MB in size), THP can potentially enhance performance.

To enable Transparent Huge Pages in the JVM, you can use the ‘-XX:+UseTransparentHugePages‘ option. This allows the Java application to take advantage of the large, aggregated memory pages managed by the operating system. It’s important to note that THP may introduce latency spikes in certain scenarios, which makes it less suitable for latency-sensitive applications. Before enabling THP, it is recommended to evaluate its impact on your specific workload and performance requirements.

Note: Configuring and managing Transparent Huge Pages at the kernel level may require additional steps, and the specifics are beyond the scope of this post.

5. Enabling NUMA Support (-XX:+UseNUMA)
ZGC has NUMA support, which means it will try its best to direct Java heap allocations to NUMA-local memory. NUMA stands for Non-Uniform Memory Access and refers to the architecture design used in multi-socket systems. In NUMA systems, memory is divided into multiple memory nodes, with each node associated with a specific processor or socket. Each processor has faster access to its own local memory node compared to accessing remote memory nodes.

By default, ZGC enables NUMA support, allowing it to leverage the benefits of NUMA architectures. It automatically detects and utilizes local memory nodes to optimize memory access and improve performance. However, if the JVM detects that it’s bound to use memory on a single NUMA node, NUMA support will be disabled.

In most cases, you don’t need to explicitly configure NUMA support. However, if you want to override the JVM’s decision, you can use the following options:

To explicitly enable NUMA support: -XX:+UseNUMA

To explicitly disable NUMA support: -XX:-UseNUMA

Note: NUMA support is particularly relevant in multi-socket x86 machines or other systems with NUMA architecture. It may not have a significant impact on performance in single-socket or non-NUMA systems.

6. Returning Unused Memory to the Operating System (-XX:+ZUncommit)
ZGC is efficiently designed to manage large heap sizes. Allocating a large heap size when application doesn’t need it can result in inefficient memory usage. By default, ZGC uncommits unused memory, returning it to the operating system. This feature can be disabled using -XX:-ZUncommit.

ZGC ensures that memory is not uncommitted to the extent that the heap size falls below the specified minimum heap size (-Xms). Consequently, if the minimum heap size is set to match the maximum heap size (-Xmx), the uncommit feature will be implicitly disabled.

To provide flexibility in managing uncommitted memory, ZGC allows you to configure an uncommit delay using the -XX:ZUncommitDelay=<seconds> option, with a default delay of 300 seconds. This delay specifies the duration for which memory should remain unused before it becomes eligible for uncommit.

NOTE: Allowing the ZGC to commit and uncommit memory while the application is running can potentially affect the response time of the application. If achieving extremely low latency is a primary objective when using ZGC, it is recommended to set the same value for both the maximum heap size (-Xmx) and the minimum heap size (-Xms). Additionally, utilizing the -XX:+AlwaysPreTouch option can be beneficial, as it pre-pages memory before the application starts, optimizing performance and reducing latency.

Tuning ZGC behavior
Studying the performance characteristics of ZGC is best achieved by analyzing the GC log. The GC log contains detailed information about garbage collection events, memory usage, and other relevant metrics. There are several tools available that can assist in analyzing the GC log, such as GCeasy, IBM GC & Memory visualizer, HP Jmeter, Google Garbage cat. By using these tools, you can visualize memory allocation patterns, identify potential bottlenecks, and assess the efficiency of garbage collection. This allows for informed decision-making when fine-tuning ZGC for optimal performance.

Conclusion
In conclusion, this post discussed various JVM tuning parameters for ZGC, aiming to optimize its performance in Java applications. By leveraging these tuning options, developers can fine-tune ZGC to deliver optimal performance based on their specific requirements. Additionally, closely analyzing the GC log and monitoring the behavior of ZGC can provide valuable insights into its performance characteristics. By experimenting with these tuning parameters and closely monitoring the GC log, developers can unlock the full potential of ZGC and ensure efficient garbage collection in their Java applications.