Java garbage collection long pause times
Environment
- Java
Issue
- Very long full garbage collections (GC).
- Some Full GCs take more than 5 minutes.
- Minor (young) GCs are taking over 30 seconds.
- Thread dumps show threads in a state "BLOCKED (on object monitor)" in code that isn't even synchronized and shouldn't be blocking for any objects.
Resolution
-
See jvmconfig, a Red Hat Customer Portal Labs app, for an interactive way to generate an optimized configuration for your environment.
-
See G1 Collector Tuning .
-
See the tech brief Improving OpenJDK and Oracle JDK Garbage Collection Performance
-
Change from the parallel throughput collector to the CMS low pause collector (
-XX:+UseConcMarkSweepGC). -
The concurrent collector can be used in an incremental mode (-XX:+CMSIncrementalMode) that divides the work done concurrently by the collector into small chunks of time which are scheduled between young generation collections to lessen the impact of long concurrent phases. This feature is useful when applications that need the low pause times provided by the concurrent collector are run on machines with small numbers of processors (e.g., 1 or 2).
-
Decrease the number of parallel garbage collectors with the
-XX:ParallelGCThreadsJVM option. -
In some cases,
-XX:+ParallelRefProcEnabledcan improve parallel young gen processing times -
Enable large page support. See How do I enable large page support for Java?.
-
For JBoss Enterprise Application Platform (EAP) 5.0.0, test adding the following JVM options which internal testing has shown to be very effective at improving performance:
-XX:+AggressiveOpts and -XX:+DoEscapeAnalysis -
Add the
-XX:+UseCompressedOopsJVM option (starting Oracle/Sun JDK 1.6 update 14 and JDK 7) to enable compressed object pointers. This helps prevent 64-bit object bloat by encoding object references as 32-bit values. Use-XX:+UseCompressedOopswith heap sizes less than 32GB.- It is enabled by default in JDK 1.6 update 23 on Linux. Use the JVM option
-XX:+PrintFlagsFinalto check whether it is enabled by default in the JVM you are running. - Using this option prior to JDK 1.6 update 21 is known to cause the JVM to crash in some cases. See Java crash on Sun JDK with both survivor spaces occupied or empty.
- It is enabled by default in JDK 1.6 update 23 on Linux. Use the JVM option
-
Add the
-XX:+ExplicitGCInvokesConcurrentJVM option so explicit GC calls from System.gc() will be done concurrently. -
Set
-Xmsequal to-Xmxand-XX:PermSizeequal to-XX:MaxPermSizeto avoid uneeded resizing collections. -
If using the Concurrent Mark Sweep (CMS) low pause collector, enable class unloading so the perm/metaspace can be collected concurrently instead of using the serial collector with the following JVM option(s): (1)
-XX:+CMSPermGenSweepingEnabled -XX:+CMSClassUnloadingEnabled(JDK 1.5), (2)-XX:+CMSClassUnloadingEnabled(JDK 1.6+). -
See the resolutions for the issues listed in the Root Cause section.
Linux
For a file-based workload such as a database, file server, or web-server, zone_reclaim should be disabled:
echo 0 > /proc/sys/vm/zone_reclaim_mode
Or add the following to /etc/sysctl.conf to make the value persist between reboots:
zone_reclaim_mode = 0
Root Cause
- GC will pause the JVM and its other threads, sometimes putting them in a BLOCKED state.
- Memory paging/swapping. There is not enough physical memory for the JVM and other services.
- If a virtual environment, consider Unaccounted memory usage when running Red Hat Enterprise Linux as a VMware guest
- The collector being used is optimized for throughput, not low pauses.
- The number of parallel garbage collection threads is too high for the hardware and any virtual OS or services running on the shared hardware.
- Large heap (>2GB) without large page support.
- A large heap. Pause time is directly related to heap size, so all things equal, smaller heaps will have smaller pauses.
- Explicit GC calls when the low pause collector (
-XX:+UseConcMarkSweepGC) is specified are done with a serial collector which was not designed with multi cpu systems in mind so is typically quite slow. - A tool such as pstack was run, which causes all threads to stop until it completes. If it takes a long time for pstack to run, this will pause the JVM.
- When using the Java low pause Concurrent Mark Sweep (CMS) garbage collector "concurrent mode failure" appears in GC logs
- When using the Java low pause Concurrent Mark Sweep (CMS) garbage collector "concurrent mode interrupted" appears in GC logs
- Java CMS remark collection "weak refs processing" phase takes a long time
- Java application periodic high latency / processing times due to NUMA page reclaim on RHEL
- If seeing long pause time outside of GC, see A significant amount of JVM stopped time is not GC related
- See G1 Collector Tuning
- G1 GC shows large time in Other phase
- The CMS collector is not configured to collect the perm/metaspace concurrently so the serial (slow) collector is being used.
Linux
- The kernel enables vm.zone_reclaim_mode by setting it to one when a multiple node NUMA is detected on boot, which leads to processes going into page reclaim on the local NUMA node instead of accessing memory on the other node.
- Java garbage collection long pause with high 'sys' time
- Long garbage collection pause due to single-threaded CMS initial mark
- Long G1 young pause collection due to external root scanning
Diagnostic Steps
-
Enable the standard garbage collection logging options and analyze the logging.
-
For really long pauses much greater than the average, check to see if a tool such as
pstackwas run.pstackcauses all threads to stop until it completes, so if it takes a long time for pstack to run, this will pause the JVM. -
Check how much physical memory is available and verify there is enough for the JVM and any other services running on the box to avoid paging/swapping (Linux: run
sarcommand or look in/var/log/safor previous metrics) -
Ask if the JVM is running in a virtual or physical environment. If a virtual environment, check to see if memory reservation is restricting the allocated RAM.
-
Compare the number of threads for GC in the thread dump with the gc logging parallelism to see if they are equal.
OpenJDK / Sun JDK
-
Analyze the GC logging to see what collectors are being used.
-
See G1 Collector Tuning .
-
If the PARALLEL_SERIAL_OLD collector is being used and there is more than one CPU, add the
-XX:+UseParallelOldGCJVM option so full collections are not done single threaded but instead with the newer parallel throughput collector (PARALLEL_OLD_COMPACTING). This is typically best on multiprocessor hardware. -
The default collector for OpenJDK and Sun JDK 1.5 and 1.6 is the throughput collector, which will optimize throughput at the expense of long pause times. Test the low pause collector (-XX:+UseConcMarkSweepGC).
-
If the heap is large (>2GB), test enabling large page support, as it typically has a big impact on performance.
-
Check that the number of CPUs available to the JVM does not conflict with the default
-XX:ParallelGCThreadssetting. The default value for-XX:+UseParallelGCis to set the number of threads to the total number of hardware threads (cpus). For-XX:+UseConcMarkSweepGC/-XX:+UseParNewGCthe default is computed as: (ncpus <= 8) ? ncpus : 3 + ((ncpus * 5) / 8) (where ncpus is the # cores on the platform or in the pset of the JVM set if bound to a pset). If-XX:ParallelGCThreadsis set too high, the threads will compete with one another and affect performance. -
Check to see if the JVM is running in a virtual or physical environment. If running in a virtual environment, explicitly set
-XX:ParallelGCThreadsbased on the number of CPUs assigned to the guest OS. -
Check to see if the JVM is running in a dedicated environment. If not, then
-XX:ParallelGCThreadsmay have to be adjusted to account for the cpu used by the collocated services. -
Check to see if the CMS remark collection is unloading classes, allowing perm/metaspace to be collected concurrently. For example, note
class unloadingin the following JDK8 logging:4.578: [Rescan (parallel) , 0.0185521 secs]4.597: [weak refs processing, 0.0008993 secs]4.598: [class unloading, 0.0046742 secs]4.603: [scrub symbol table, 0.0044444 secs]4.607: [scrub string table, 0.0005670 secs][1 CMS-remark: 6569K(4023936K)] 16685K(4177280K), 0.1025102 secs] [Times: user=0.17 sys=0.01, real=0.10 secs] -
Check to see if the low pause collector is specified (
-XX:+UseConcMarkSweepGC) and the long pauses are of the following format:80361.932: [Full GC (System) 80361.940: [CMS: 5612504K->231062K(7626048K), 24.5546875 secs] 5710661K->231062K(8312384K), [CMS Perm : 94248K->91371K(131072K)], 24.5555401 secs] [Times: user=24.54 sys=0.04, real=24.56 secs]This is a serial collector and was not designed with multiprocessor systems in mind; therefore, it is typically quite slow. It can be caused by explicit GC calls, which by default invoke a serial collector. Test adding the following JVM option in combination with
-XX:+UseConcMarkSweepGCso explicit GC calls from System.gc() will be done concurrently:
-XX:+ExplicitGCInvokesConcurrent -
Check for evidence of poor or inverted parallism with the collections taking the longest time. For example:
[GC 239727.670: [ParNew: 2713920K->82240K(2713920K), 24.0534189 secs] 13734454K->11225032K(25083584K), 24.0541689 secs] [Times: user=3.27 sys=1.79, real=24.05 secs]Note that the real time for the collection is much larger than the user + sys time: 24.05 secs vs. 3.27 + 1.79 = 5.06 secs. This means that GC threads used 5.06 seconds of total cpu time over a real time of 24.05 seconds. It should be the opposite. For example, with -XX:ParallelGCThreads=6, with ideal parallelism the the user + sys time would be 6 times the real time. So not only is the parallelism not good, it is very bad in the opposite direction. This could be an indication high io (e.g. disk or network access). For example, it could mean swapping is happening. Sometimes in virtual environments there can be plenty of underlying physical memory and even memory provisioned to the virtual OS, yet the memory that can be allocated is limited. It can also indicate there is not enough cpu (e.g. GC threads competing with each other or other processes).
Linux
-
Check to see if zone_reclaim_mode is set to "1" (enabled):
cat /proc/sys/vm/zone_reclaim_mode -
Determine the collector and phase for the long pause.
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.