OpenJDK latencyDetector for tracking long pauses on gc logging
Environment
- OpenJDK 11
(not valid for JDK 17 given writing is async)
Issue
- How to track JVM gc log pauses on writing?
- How to track a latency on the writing operation of the JVM gc logs?
Resolution
Scenarios
In some cases the gc logs can be an issue, given NFS for instance, the gc writing operator itself can be impacted by a latency on the NFS gc logs.
Therefore this might be logged on the gc logs as Others utilization - and the writing process itself is an sync process and cause a JVM stop (see below).
gc.log:[2021-01-09T3:01:..][3792.777s][info ][gc,phases ] GC(9) Other: 9569.7ms <--- that's a 9s pause on Others that is not counted as real pause
real=0.10s
For those scenarios one application can be deployed to test latency by doing a heart beat operation on the gc logs writing.
Writing to gc logs is done in a safepoint
GC log writes are during the safepoint - which is definitely not optimal. **So an I/O issue (latency) writing to `gc.log` will block the entire JVM.**
This is valid up to JDK 17, where the writing process became `async` - so it doesn't cause a pause to write gc logs.
Gc writing Latency Detector
The latency detector does 1 Gc operator per second (heart beat) and writes that to gc logs, and if there is any latency on the gc.log writing, for instance the gc is being written for an another director/network accessed file (NFS) so then the operation will get delay be some amount and the 1s heart beat will have a delay - which will likely be written on the gc logs themselves.
Deployment
That's a standalone process can be done with a bash script calling the Java process. Example of a bash to call the process:
echo "Path:" $1
java -Xmx50m -Xlog:gc*=debug:file=$1/gc.log:time,uptime,level,tags:filecount=4,filesize=10M RunningGcs
latency detector code:
// Java application:
/**
* This method guarantees that garbage collection is done unlike <code>{@link System#gc()}</code>
* https://stackoverflow.com/questions/1481178/how-to-force-garbage-collection-in-java
*/
// Weak reference import:
import java.lang.ref.WeakReference;
// Running Gc main class:
public class RunningGcs {
public static void main(String args[]) throws InterruptedException {
System.out.println("===== Latency Detector =====");
for(;;)
{
gc();
//Pause 1 minute
Thread.sleep(1000);
}
}
public static void gc() {
Object obj = new Object();
WeakReference ref = new WeakReference<Object>(obj);
obj = null;
System.gc();
}
}
gc logs - example:
[2022-10-14T16:28:28.177-0400][5.070s][info ][gc ] GC(5) Pause Full (System.gc()) 1M->0M(10M) 2.167ms
[2022-10-14T16:28:28.177-0400][5.070s][info ][gc,cpu ] GC(5) User=0.01s Sys=0.00s Real=0.00s
[2022-10-14T16:28:29.178-0400][6.070s][debug][gc,task ] GC(6) Requesting 1 active workers for full compaction (waste limited workers: 1, adaptive workers: 2)
[2022-10-14T16:28:29.178-0400][6.070s][info ][gc,task ] GC(6) Using 1 workers of 8 for full compaction
[2022-10-14T16:28:29.178-0400][6.070s][info ][gc,start ] GC(6) Pause Full (System.gc())
Overhead calculation:
Footprint: the overhead is very small - the log rotation is set for 100mb so you have 4 files - total 400MB.
Time to fullfill a gc file: 2441 full gcs == 40 minutes
Time to fullfill the 4 gc files: 160 minutes
Number of threads: Usually one thread for compaction with 1 thread.
Max heap: 50mb <-- this can be adjusted on the script
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.