A significant amount of JVM stopped time is not GC related
Environment
- OpenJDK
- Oracle JDK
Issue
- GC analysis shows a significant amount of JVM stopped time is not GC related. For example, garbagecat summary output shows a high total stopped time and a relatively low GC total pause time or a low GC/stopped ratio, as in the example below:
========================================
SUMMARY:
----------------------------------------
# GC Events: 1032
Event Types: PARALLEL_SCAVENGE, PARALLEL_COMPACTING_OLD
# Parallel Events: 1032
# Inverted Parallelism: 0
NewRatio: 2
Max Heap Occupancy: 5235541K
Max Heap Space: 8339456K
Max Perm/Metaspace Occupancy: 182728K
Max Perm/Metaspace Space: 1251328K
GC Throughput: 99%
GC Max Pause: 2.934 secs
GC Total Pause: 71.572 secs
Stopped Time Throughput: 70%
Stopped Time Max Pause: 20.711 secs
Stopped Time Total: 2238.507 secs
GC/Stopped Ratio: 3%
First Timestamp: 0.143 secs
Last Timestamp: 7541.137 secs
========================================
- Application is evidencing unexplained latency and freezes when heap or permgen / metaspace are not near configured limits
- Process monitoring tools are showing high CPU usage by the application - particularly by the VM Thread with relatively low GC overhead
Resolution
- Check for monitoring tools that may rely on thread sampling, code injection, heap dumps or similar JVM operations. Minimize use of these tools when not actively troubleshooting an existing issue
- Check for physical resource contention, particularly heavy memory swapping on the application host
- Enable biased locking (remove -XX:-UseBiasedLocking).
- Disable biased locking (add -XX:-UseBiasedLocking).
Root Cause
High stopped time percentages can be caused by inverted parallelism - meaning that parallel operations are yielding little or no improvement over serial operations due to resource contention (Examples: CPU is saturated, physical RAM is depleted).
Debugging or monitoring tools can rely on JVM operations that require a safepoint (meaning threads have to be paused). Examples include:
- PrintThreads
- PrintJNI, FindDeadlock
- ThreadDump
- HeapDumper
- GetAllStackTrace
Individually, these operations may require little time, but cumulatively, these operations can add up to a large overall stopped time percentage.
Other JVM operations, such as Code Deoptimization, EnableBiasLocking, and RevokeBias events can cause higher-than-normal application stopped times.
In particular, disabling biased locking ( -XX:-UseBiasedLocking) has been known to cause significant stopped time in some applications.
Diagnostic Steps
Inspect the JVM options to see if biased locking (enabled by default) is disabled with the -XX:-UseBiasedLocking option. Some applications will have improved performance with biased locking, and other applications will have better performance without biased locking. Whether biased locking improves or hurts application performance can only be determined by testing.
Add the following JVM options to collect safepoint statistic information to see what may be causing the relatively large application stopped time not accounted for by GC activities:
-XX:+UnlockDiagnosticVMOptions -XX:+PrintSafepointStatistics -XX:PrintSafepointStatisticsCount=1 -XX:+LogVMOutput -XX:LogFile=<path>/vm.log
These options will produce a vm.log file that can be analyzed for causes of stopped time. Each entry in the log indicates the reason for the safepoint, and the time, in milliseconds each part of the operation required:
vmop [threads: total initially_running wait_to_block] [time: spin block sync cleanup vmop] page_trap_count
3643.387: RevokeBias [ 3423 1 5 ] [ 0 0 2 17 0 ] 0
vmop [threads: total initially_running wait_to_block] [time: spin block sync cleanup vmop] page_trap_count
3643.621: ThreadDump [ 3424 1 6 ] [ 0 0 1 17 10717 ] 0
In the example above, for example, we can see that the thread dump operation took nearly 11 seconds to complete.
Use the Content from github.com is not included.vmcat tool) to parse the vm log and get a breakdown of stopped time and analysis.
If there is a large amount of time spent in RevokeBias and BulkRevokeBias, test disabling biased locking with -XX:-UseBiasedLocking to see if it improves application performance.
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.