Java garbage collection long pause times

Solution Verified - Updated

Environment

  • Java

Issue

  • Very long full garbage collections (GC).
  • Some Full GCs take more than 5 minutes.
  • Minor (young) GCs are taking over 30 seconds.
  • Thread dumps show threads in a state "BLOCKED (on object monitor)" in code that isn't even synchronized and shouldn't be blocking for any objects.

Resolution

  • See jvmconfig, a Red Hat Customer Portal Labs app, for an interactive way to generate an optimized configuration for your environment.

  • See G1 Collector Tuning .

  • See the tech brief Improving OpenJDK and Oracle JDK Garbage Collection Performance

  • Change from the parallel throughput collector to the CMS low pause collector (-XX:+UseConcMarkSweepGC).

  • The concurrent collector can be used in an incremental mode (-XX:+CMSIncrementalMode) that divides the work done concurrently by the collector into small chunks of time which are scheduled between young generation collections to lessen the impact of long concurrent phases. This feature is useful when applications that need the low pause times provided by the concurrent collector are run on machines with small numbers of processors (e.g., 1 or 2).

  • Decrease the number of parallel garbage collectors with the -XX:ParallelGCThreads JVM option.

  • In some cases, -XX:+ParallelRefProcEnabled can improve parallel young gen processing times

  • Enable large page support. See How do I enable large page support for Java?.

  • For JBoss Enterprise Application Platform (EAP) 5.0.0, test adding the following JVM options which internal testing has shown to be very effective at improving performance:

      -XX:+AggressiveOpts and -XX:+DoEscapeAnalysis   
    
  • Add the -XX:+UseCompressedOops JVM option (starting Oracle/Sun JDK 1.6 update 14 and JDK 7) to enable compressed object pointers. This helps prevent 64-bit object bloat by encoding object references as 32-bit values. Use -XX:+UseCompressedOops with heap sizes less than 32GB.

  • Add the -XX:+ExplicitGCInvokesConcurrent JVM option so explicit GC calls from System.gc() will be done concurrently.

  • Set -Xms equal to -Xmx and -XX:PermSize equal to -XX:MaxPermSize to avoid uneeded resizing collections.

  • If using the Concurrent Mark Sweep (CMS) low pause collector, enable class unloading so the perm/metaspace can be collected concurrently instead of using the serial collector with the following JVM option(s): (1) -XX:+CMSPermGenSweepingEnabled -XX:+CMSClassUnloadingEnabled (JDK 1.5), (2) -XX:+CMSClassUnloadingEnabled (JDK 1.6+).

  • See the resolutions for the issues listed in the Root Cause section.

Linux

For a file-based workload such as a database, file server, or web-server, zone_reclaim should be disabled:

echo 0 > /proc/sys/vm/zone_reclaim_mode

Or add the following to /etc/sysctl.conf to make the value persist between reboots:

zone_reclaim_mode = 0

Root Cause

Linux

Diagnostic Steps

  • Enable the standard garbage collection logging options and analyze the logging.

  • For really long pauses much greater than the average, check to see if a tool such as pstack was run. pstack causes all threads to stop until it completes, so if it takes a long time for pstack to run, this will pause the JVM.

  • Check how much physical memory is available and verify there is enough for the JVM and any other services running on the box to avoid paging/swapping (Linux: run sar command or look in /var/log/sa for previous metrics)

  • Ask if the JVM is running in a virtual or physical environment. If a virtual environment, check to see if memory reservation is restricting the allocated RAM.

  • Compare the number of threads for GC in the thread dump with the gc logging parallelism to see if they are equal.

OpenJDK / Sun JDK

  • Analyze the GC logging to see what collectors are being used.

  • See G1 Collector Tuning .

  • If the PARALLEL_SERIAL_OLD collector is being used and there is more than one CPU, add the -XX:+UseParallelOldGC JVM option so full collections are not done single threaded but instead with the newer parallel throughput collector (PARALLEL_OLD_COMPACTING). This is typically best on multiprocessor hardware.

  • The default collector for OpenJDK and Sun JDK 1.5 and 1.6 is the throughput collector, which will optimize throughput at the expense of long pause times. Test the low pause collector (-XX:+UseConcMarkSweepGC).

  • If the heap is large (>2GB), test enabling large page support, as it typically has a big impact on performance.

  • Check that the number of CPUs available to the JVM does not conflict with the default -XX:ParallelGCThreads setting. The default value for -XX:+UseParallelGC is to set the number of threads to the total number of hardware threads (cpus). For -XX:+UseConcMarkSweepGC/-XX:+UseParNewGC the default is computed as: (ncpus <= 8) ? ncpus : 3 + ((ncpus * 5) / 8) (where ncpus is the # cores on the platform or in the pset of the JVM set if bound to a pset). If -XX:ParallelGCThreads is set too high, the threads will compete with one another and affect performance.

  • Check to see if the JVM is running in a virtual or physical environment. If running in a virtual environment, explicitly set -XX:ParallelGCThreads based on the number of CPUs assigned to the guest OS.

  • Check to see if the JVM is running in a dedicated environment. If not, then -XX:ParallelGCThreads may have to be adjusted to account for the cpu used by the collocated services.

  • Check to see if the CMS remark collection is unloading classes, allowing perm/metaspace to be collected concurrently. For example, note class unloading in the following JDK8 logging:

    4.578: [Rescan (parallel) , 0.0185521 secs]4.597: [weak refs processing, 0.0008993 secs]4.598: [class unloading, 0.0046742 secs]4.603: [scrub symbol table, 0.0044444 secs]4.607: [scrub string table, 0.0005670 secs][1 CMS-remark: 6569K(4023936K)] 16685K(4177280K), 0.1025102 secs] [Times: user=0.17 sys=0.01, real=0.10 secs]
    
  • Check to see if the low pause collector is specified (-XX:+UseConcMarkSweepGC) and the long pauses are of the following format:

      80361.932: [Full GC (System) 80361.940: [CMS: 5612504K->231062K(7626048K), 24.5546875 secs] 
      5710661K->231062K(8312384K), [CMS Perm : 94248K->91371K(131072K)], 24.5555401 secs] 
      [Times: user=24.54 sys=0.04, real=24.56 secs]    
    

    This is a serial collector and was not designed with multiprocessor systems in mind; therefore, it is typically quite slow. It can be caused by explicit GC calls, which by default invoke a serial collector. Test adding the following JVM option in combination with -XX:+UseConcMarkSweepGC so explicit GC calls from System.gc() will be done concurrently:
    -XX:+ExplicitGCInvokesConcurrent

  • Check for evidence of poor or inverted parallism with the collections taking the longest time. For example:

      [GC 239727.670: [ParNew: 2713920K->82240K(2713920K), 24.0534189 secs] 13734454K->11225032K(25083584K), 24.0541689 secs] 
      [Times: user=3.27 sys=1.79, real=24.05 secs]
    

    Note that the real time for the collection is much larger than the user + sys time: 24.05 secs vs. 3.27 + 1.79 = 5.06 secs. This means that GC threads used 5.06 seconds of total cpu time over a real time of 24.05 seconds. It should be the opposite. For example, with -XX:ParallelGCThreads=6, with ideal parallelism the the user + sys time would be 6 times the real time. So not only is the parallelism not good, it is very bad in the opposite direction. This could be an indication high io (e.g. disk or network access). For example, it could mean swapping is happening. Sometimes in virtual environments there can be plenty of underlying physical memory and even memory provisioned to the virtual OS, yet the memory that can be allocated is limited. It can also indicate there is not enough cpu (e.g. GC threads competing with each other or other processes).

Linux

  • Check to see if zone_reclaim_mode is set to "1" (enabled):

      cat /proc/sys/vm/zone_reclaim_mode
    
  • Determine the collector and phase for the long pause.

Components

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.