Tuning ParallelGCThreads / ConcGCThreads in CMS on OpenJDK

Solution Verified - Updated

Environment

  • Red Hat Enterprise Linux (RHEL)
    • 7.x
  • OpenJDK/OracleJDK 8
    • 8/7 (CMS is deprecated in JDK 11)
  • Red Hat JBoss Enterprise Application Platform (EAP)
    • 7.x

Issue

  • How are ConcGCThreads and ParallelGCThreads calculated?
  • How to tune ConcGCThreads and ParallelGCThreads in CMS?
  • Monitors and benchmarks indicates an overwork of cpu activity and long Garbage Collection pauses while running multiple JBoss EAP instances. What values should I set for ParallelGCThreads / CMSParallelThreads to solve it?

Resolution

Based on the resources available and application requirements, one could tune the CMS (which is very flexible and has more than 72 flags) to change the default configuration, which be creating a bottlenecks such as high load average, increase of Garbage Collection pause times and application performance decreases.

Meaning

Below the flags for tuning CMS are explained:

FlagMeaning
ParallelGCThreadssets the number of threads used for parallel garbage collection in the young and old generations. The default value depends on the number of CPUs available to the JVM.
ConcGCThreadssets the number of threads used for concurrent GC. The default value depends on the number of CPUs available to the JVM.

The flag ParallelCMSThreads is the previous name of ConcGCThreads.

Default values for ParallelGCThreads

Below are described the values from CMS GC. The default value formula for the number of ParallelGCThreads default value is given by:

ParallelGCThreads = (ncpus <= 8) ? ncpus : 3 + ((ncpus * 5) / 8)

For instances:

  • When number of cpus=4, ParallelGCThreads=4
  • When number of cpus=8, ParallelGCThreads=8
  • When number of cpus=16, ParallelGCThreads=13

The ConcGCThreads formelly known as ParallelCMSThreads, is given by:

ConcGCThreads = (ParallelGCThreads + 3) / 4

For instance:

  • When number of cpus=4, ConcGCThreads =1
  • When number of cpus=8, ConcGCThreads =2
  • When number of cpus=16, ConcGCThreads =4

The formula above shows that ConcGCThreads and ParallelGCThreads are not decoupled, meaning that changing one value will interfere on the other value.

Tuning

A higher number of threads may well speed up the concurrent CMS phases, however it also causes additional synchronization overhead. Consequently, for a particular application at hand, it should be measured if increasing the number of CMS threads really brings an improvement or not.

In other words, with concurrency processes there will may have more GCThreads than cores available on the server and it could decrease the performance. It's possible to imagine a scenario with more GCThreads than CPU's available to process it. And we have to consider that at one point in time, all JBoss instances are under the same load, and can perform garbage collection concurrently.

There isn't a formula to solve this situation, however, as rule of thump considering that EAP will work with a dedicated server, it's always good to have at least one core available.

Select a target metric

Elect the Garbage Collector goals: throughput, low gc pauses and number of collections. If you're looking for low GC pause times the number of Threads affects directly the pause times. In the tests for this article the best results for a 16 cores host and 3 JBoss Instances were acquired using 5 Threads for each process.

The metrics acquired with the application tested for this article could be different from yours so remember that this changes on your settings have to be based on benchmarks and monitors indicators.

Alternatives for ParallelGCThreads | ActiveProcessorCount overriding number of CPUs

The flag ActiveProcessorCount will override the number of CPUs that the VM will use to calculate the size of thread pools it will use for various operations such as Garbage Collection. Therefore this is an alternative to setting ParallelGCThreads directly would be to use the -XX:ActiveProcessorCount=N flag introduced in JDK8u191 to override the number of CPUs the JVM detects and allow the JVM to assign threading based on the override value (e.g. -XX:ActiveProcessorCount=4).
Example:

$ ./java -XX:ActiveProcessorCount=16 -XX:+PrintFlagsFinal 
...
    uintx ParallelGCThreads                         = 13                                  {product}
$ ./java -XX:ActiveProcessorCount=10 -XX:+PrintFlagsFinal
    uintx ParallelGCThreads                         = 9                                   {product}

Root Cause

Calculation for concurrent threads/parallel thred it is GC-specific. Above it is discussed CMS calculation.
The flags ParallelGCThreads and ConcGCThreads are calculated from the cpu value and their relation (association) is explained in Content from mail.openjdk.java.net is not included.Better default for ParallelGCThreads and ConcGCThreads by using number of physical cores and CPU mask - mailing list

The defaults come from concurrentMarkSweepGeneration.cpp:

 // Support for multi-threaded concurrent phases
  if (CMSConcurrentMTEnabled) {
    if (FLAG_IS_DEFAULT(ConcGCThreads)) {
      // just for now
      FLAG_SET_DEFAULT(ConcGCThreads, (ParallelGCThreads + 3)/4);
    }
    if (ConcGCThreads > 1) {
      _conc_workers = new YieldingFlexibleWorkGang("Parallel CMS Threads",
                                 ConcGCThreads, true);
      if (_conc_workers == NULL) {
        warning("GC/CMS: _conc_workers allocation failure: "
              "forcing -CMSConcurrentMTEnabled");
        CMSConcurrentMTEnabled = false;
      } else {
        _conc_workers->initialize_workers();
      }
    } else {
      CMSConcurrentMTEnabled = false;
    }
  }
  if (!CMSConcurrentMTEnabled) {
    ConcGCThreads = 0;
  } else {
    // Turn off CMSCleanOnEnter optimization temporarily for
    // the MT case where it's not fixed yet; see 6178663.
    CMSCleanOnEnter = false;
  }
  assert((_conc_workers != NULL) == (ConcGCThreads > 1),
"Inconsistency");

Diagnostic Steps

Display the current values:

To see the actual values just start the application using the PrintFlagsFinal flag, which prints all options and their values used by the JVM at the start:

    uintx ConcGCThreads                             = 2                                   {product}
    uintx ParallelGCThreads                         = 8                                   {product}
Category

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.