Alternatives for creating heap dump and thread dump in a DG 8 even without the JDK
Environment
- Red hat OpenShift Container Platform (OCP)
- 4.x
- Red Hat Data Grid (RHDG)
- 8.x
Issue
- What are the options for creating heap dump in OCP 4?
- What are the options for creating heap dump in OCP 4 without a JDK?
- What are the options for creating thread dump in OCP 4?
Resolution
Latest DG 8.4.x images have jcmd - which should be used for thread dump/heap dump/VM.info
Thread dump
The latest DG images (DG 8.4.x+) have jcmd (which is a utility installed inside Red Hat's JRE run-time image) so use jcmd for better performance, JDK is not present, only jcmd. The previous image didn't have it.
If there isn't jmap, jcmd or jps on the image, use kill -3 to collect thread dumps:
oc exec $1 -- bash -c "for x in {1..10}; do kill -3 166; sleep 10; done"
^ Example above brings 166 as PID.Verify the PID via ps -aux to see the DG JVM to start.
Heap dump
What are the options for creating a heap dump in an image with JDK:
jcmd, jmap, jstat - this can be done inside the JDK inside the image.
DG 8.4
DG 8.4 REST user can trigger a heap dump via "POST /rest/v2/server/memory?action=heap-dump[&live=true|false]"
What are the options for creating a heap dump in a pod without JDK:
The only option would be to create an image with jcmd for instance or jattach and debug from there.
Because of permissions in the pod itself, to put jcmd in the jre bin itself (the best fix) we would need to do it during image creation.
The simplest workaround to get heap dumps is to inject jcmd inside the pod - see below:
jcmd
By creating two files on the pod, one is the jcmd and the second is a symlink to the respective shared library: libjli.so - in fact jcmd has a relative path dependency (hardcoded) link to ./../lib/jli/libjli.so - meaning it must be in the directory lib above jcmd:
- jcmd executable in a
/bindirectory (copied from the JDK) - And
/lib(a symlink to /usr/lib/jvm/jre/lib) vialn -s $JAVA_HOME/lib - Then running
java/bin/jcmdworks
Step by step:
-
First set up the directories:
# pod: $ oc rsh $pod-name <--- go into the pod $ cd /opt/infinispan/ $ mkdir diag $ cd diag; mkdir bin # local machine: $ ls jcmd <--- copied from JDK/bin -
Second copy the jcmd to the pod:
# in your local machine you have a JDK with jcmd - go to $JAVA_HOME/bin # $JAVA_HOME/java-11-openjdk-11.0.15.0.9-3.portable.jdk.el.x86_64/bin $ oc cp jcmd $pod-name:/opt/infinispan/diag/bin <-- /diag/bin created for this purpose -
Third, with the jcmd executable there, create the symlink on the pod:
$ oc rsh $pod-name <--- go into the pod sh-4.4$ $ cd /opt/infinispan/diag <---- go to the /dialog sh-4.4$ $ ln -s $JAVA_HOME/lib <---- create symlink to the lib # result in the directory will be: sh-4.4$ pwd /opt/infinispan/diag sh-4.4$ ls -ls total 0 0 drwxr-xr-x. 2 1000650000 root 18 Jul 15 16:33 bin 0 lrwxrwxrwx. 1 1000650000 root 20 Jul 15 16:38 lib -> /usr/lib/jvm/jre/lib -
Confirm its creation by executing jcmd:
$ ./bin/jcmd <---- execute 160 org.infinispan.server.loader.Loader org.infinispan.server.Bootstrap --bind-address=0.0.0.0 -l /opt/infinispan/server/conf/operator/log4j.xml -c user/infinispan-config.yaml -c operator/infinispan.xml 724 jdk.jcmd/sun.tools.jcmd.JCmd
Another option would be to copy the $JRE to $HOME, example: cp -r $JAVA_HOME /home/infinispan/diag and given that /bin jcmd is inside, fulfills the dependency to ./../lib/jli/libjli.so it should work.
Finally, for more information see JCMD usage in OpenJDK for troubleshooting.
Root Cause
This happens because DG 8.3.x uses the layer ubi8/openjdk-ubi11-runtime base images, which does not provide the required tooling, i.e. jcmd.
Important note: when making a new image, there shouldn't be any need for the symlink/etc. One can just copy jcmd directly into the jre bin dir, where the dependency is match already. The symlink is only needed when doing it after the fact, when we don't have permissions for the JRE dir.
Other alternatives such as OCP toolbox insertion might not work given the different user executing the java vs jcmd.
Diagnostic Steps
About creating jcmd dir:
- When running a pod, one cannot create a directory inside JRE directory.
- one can create an alias for executing the process: via
alias jcmd=" /opt/infinispan/diag/bin/jcmd", so then to execute jcmd without the path. - If
/diagexists it copies jre into it. If diag doesn't exist it copies jre as it. Meaning if you are inside /opt/infinispan/dialog directory and you do cp -r $JAVA_HOME it will copy the JRE no the dirs inside of it:
sh-4.4$ cp -r $JAVA_HOME .
sh-4.4$ ls
jre
- OCP toolbox insertion might not work given the different user executing the java vs jcmd.
Investigating heap/off heap usage Dg 8 OCP 4:
If there is no gc logs set - do the following:
Heap usage can be investigated via jcmd GC.heap_info
./jcmd 1 GC.heap_info
1:
def new generation total 72512K, used 35796K [0x0000000715a00000, 0x000000071a8a0000, 0x0000000763c00000)
eden space 64512K, 53% used [0x0000000715a00000, 0x0000000717bbd6b0, 0x0000000719900000)
from space 8000K, 15% used [0x0000000719900000, 0x0000000719a37960, 0x000000071a0d0000)
to space 8000K, 0% used [0x000000071a0d0000, 0x000000071a0d0000, 0x000000071a8a0000)
tenured generation total 161152K, used 0K [0x0000000763c00000, 0x000000076d960000, 0x0000000800000000)
the space 161152K, 0% used [0x0000000763c00000, 0x0000000763c00000, 0x0000000763c00200, 0x000000076d960000)
Metaspace used 2213K, capacity 5319K, committed 5376K, reserved 1056768K
class space used 182K, capacity 503K, committed 512K, reserved 1048576K
Off heap usage:
Do ps -aux:
$ ps -aux
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
1000650+ 1 0.2 0.7 6530408 127232 ? Ssl 20:01 0:13 java -cp /opt/infinispan/lib/jgroups-4.2.18.Final-redhat-00001.jar org.jgroups.stack.GossipRouter -port 7900 -dump_msgs registration
1000650+ 857 0.0 0.0 19336 3608 pts/0 Ss 21:05 0:00 /bin/sh
1000650+ 1219 0.0 0.0 19220 3612 pts/1 Ss 21:07 0:00 /bin/sh
1000650+ 6249 0.0 0.0 19332 3660 pts/2 Ss 21:20 0:00 /bin/sh
1000650+ 8481 0.0 0.0 54780 3944 pts/2 R+ 21:25 0:00 ps -aux
Quantity of Gc operations (full heap cleaning):
./jcmd 1 VM.info | grep "Heap after GC invocations="
{Heap after GC invocations=2 (full 0):
{Heap after GC invocations=3 (full 0):
Getting the VM.info in one operation:
In the case below, jboss-modules.jar is the identifier for the process given a valid $PODNAME:
$ oc rsh $PODNAME sh -c "jcmd jboss-modules.jar VM.info > /tmp/VM.info" && oc cp $PODNAME:/tmp/VM.info ./VM.info
$ ls
VM.info <-----
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.