Alternatives for creating heap dump and thread dump in a DG 8 even without the JDK

Solution Verified - Updated

Environment

  • Red hat OpenShift Container Platform (OCP)
    • 4.x
  • Red Hat Data Grid (RHDG)
    • 8.x

Issue

  • What are the options for creating heap dump in OCP 4?
  • What are the options for creating heap dump in OCP 4 without a JDK?
  • What are the options for creating thread dump in OCP 4?

Resolution

Latest DG 8.4.x images have jcmd - which should be used for thread dump/heap dump/VM.info

Thread dump

The latest DG images (DG 8.4.x+) have jcmd (which is a utility installed inside Red Hat's JRE run-time image) so use jcmd for better performance, JDK is not present, only jcmd. The previous image didn't have it.
If there isn't jmap, jcmd or jps on the image, use kill -3 to collect thread dumps:

oc exec $1 -- bash -c "for x in {1..10}; do kill -3 166; sleep 10; done" 

^ Example above brings 166 as PID.Verify the PID via ps -aux to see the DG JVM to start.

Heap dump

What are the options for creating a heap dump in an image with JDK:
jcmd, jmap, jstat - this can be done inside the JDK inside the image.

DG 8.4
DG 8.4 REST user can trigger a heap dump via "POST /rest/v2/server/memory?action=heap-dump[&live=true|false]"

What are the options for creating a heap dump in a pod without JDK:
The only option would be to create an image with jcmd for instance or jattach and debug from there.
Because of permissions in the pod itself, to put jcmd in the jre bin itself (the best fix) we would need to do it during image creation.

The simplest workaround to get heap dumps is to inject jcmd inside the pod - see below:

jcmd

By creating two files on the pod, one is the jcmd and the second is a symlink to the respective shared library: libjli.so - in fact jcmd has a relative path dependency (hardcoded) link to ./../lib/jli/libjli.so - meaning it must be in the directory lib above jcmd:

  1. jcmd executable in a /bin directory (copied from the JDK)
  2. And /lib (a symlink to /usr/lib/jvm/jre/lib) via ln -s $JAVA_HOME/lib
  3. Then running java/bin/jcmd works

Step by step:

  1. First set up the directories:

    # pod:
    $ oc rsh  $pod-name <--- go into the pod
    $ cd /opt/infinispan/
    $ mkdir diag
    $ cd diag; mkdir bin
    # local machine:
    $ ls
    jcmd  <--- copied from JDK/bin
    
  2. Second copy the jcmd to the pod:

    # in your local machine you have a JDK with jcmd - go to $JAVA_HOME/bin
    # $JAVA_HOME/java-11-openjdk-11.0.15.0.9-3.portable.jdk.el.x86_64/bin
    $ oc cp jcmd  $pod-name:/opt/infinispan/diag/bin <-- /diag/bin created for this purpose
    
  3. Third, with the jcmd executable there, create the symlink on the pod:

    $ oc rsh  $pod-name <--- go into the pod
    sh-4.4$  $ cd /opt/infinispan/diag <---- go to the /dialog
    sh-4.4$  $ ln -s $JAVA_HOME/lib <---- create symlink to the lib
    # result in the directory will be:
    sh-4.4$ pwd
    /opt/infinispan/diag
    sh-4.4$ ls -ls 
    total 0
    0 drwxr-xr-x. 2 1000650000 root 18 Jul 15 16:33 bin
    0 lrwxrwxrwx. 1 1000650000 root 20 Jul 15 16:38 lib -> /usr/lib/jvm/jre/lib
    
  4. Confirm its creation by executing jcmd:

        $ ./bin/jcmd                        <---- execute
        160 org.infinispan.server.loader.Loader org.infinispan.server.Bootstrap --bind-address=0.0.0.0 -l /opt/infinispan/server/conf/operator/log4j.xml -c user/infinispan-config.yaml -c operator/infinispan.xml
    724 jdk.jcmd/sun.tools.jcmd.JCmd
    

Another option would be to copy the $JRE to $HOME, example: cp -r $JAVA_HOME /home/infinispan/diag and given that /bin jcmd is inside, fulfills the dependency to ./../lib/jli/libjli.so it should work.
Finally, for more information see JCMD usage in OpenJDK for troubleshooting.

Root Cause

This happens because DG 8.3.x uses the layer ubi8/openjdk-ubi11-runtime base images, which does not provide the required tooling, i.e. jcmd.
Important note: when making a new image, there shouldn't be any need for the symlink/etc. One can just copy jcmd directly into the jre bin dir, where the dependency is match already. The symlink is only needed when doing it after the fact, when we don't have permissions for the JRE dir.
Other alternatives such as OCP toolbox insertion might not work given the different user executing the java vs jcmd.

Diagnostic Steps

About creating jcmd dir:

  1. When running a pod, one cannot create a directory inside JRE directory.
  2. one can create an alias for executing the process: via alias jcmd=" /opt/infinispan/diag/bin/jcmd", so then to execute jcmd without the path.
  3. If /diag exists it copies jre into it. If diag doesn't exist it copies jre as it. Meaning if you are inside /opt/infinispan/dialog directory and you do cp -r $JAVA_HOME it will copy the JRE no the dirs inside of it:
sh-4.4$ cp -r  $JAVA_HOME .
sh-4.4$ ls
jre
  1. OCP toolbox insertion might not work given the different user executing the java vs jcmd.

Investigating heap/off heap usage Dg 8 OCP 4:

If there is no gc logs set - do the following:

Heap usage can be investigated via jcmd GC.heap_info

./jcmd 1 GC.heap_info
1:
 def new generation   total 72512K, used 35796K [0x0000000715a00000, 0x000000071a8a0000, 0x0000000763c00000)
  eden space 64512K,  53% used [0x0000000715a00000, 0x0000000717bbd6b0, 0x0000000719900000)
  from space 8000K,  15% used [0x0000000719900000, 0x0000000719a37960, 0x000000071a0d0000)
  to   space 8000K,   0% used [0x000000071a0d0000, 0x000000071a0d0000, 0x000000071a8a0000)
 tenured generation   total 161152K, used 0K [0x0000000763c00000, 0x000000076d960000, 0x0000000800000000)
   the space 161152K,   0% used [0x0000000763c00000, 0x0000000763c00000, 0x0000000763c00200, 0x000000076d960000)
 Metaspace       used 2213K, capacity 5319K, committed 5376K, reserved 1056768K
  class space    used 182K, capacity 503K, committed 512K, reserved 1048576K

Off heap usage:

Do ps -aux:

$ ps -aux
USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
1000650+       1  0.2  0.7 6530408 127232 ?      Ssl  20:01   0:13 java -cp /opt/infinispan/lib/jgroups-4.2.18.Final-redhat-00001.jar org.jgroups.stack.GossipRouter -port 7900 -dump_msgs registration
1000650+     857  0.0  0.0  19336  3608 pts/0    Ss   21:05   0:00 /bin/sh
1000650+    1219  0.0  0.0  19220  3612 pts/1    Ss   21:07   0:00 /bin/sh
1000650+    6249  0.0  0.0  19332  3660 pts/2    Ss   21:20   0:00 /bin/sh
1000650+    8481  0.0  0.0  54780  3944 pts/2    R+   21:25   0:00 ps -aux

Quantity of Gc operations (full heap cleaning):

./jcmd 1 VM.info | grep "Heap after GC invocations="
{Heap after GC invocations=2 (full 0):
{Heap after GC invocations=3 (full 0):

Getting the VM.info in one operation:

In the case below, jboss-modules.jar is the identifier for the process given a valid $PODNAME:

$ oc rsh $PODNAME sh -c "jcmd jboss-modules.jar VM.info > /tmp/VM.info" && oc cp $PODNAME:/tmp/VM.info ./VM.info
$ ls
VM.info <-----
Product(s)
Components
Category

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.