Investigating heap and off heap usage in DG 8 OCP 4

Solution Verified - Updated

Environment

  • Red hat OpenShift Container Platform (OCP)
    • 4.x
  • Red Hat Data Grid (RHDG)
    • 8.x
    • DG Operator 8.3.7.csv (and before)

Issue

  • How to investigate heap and off heap usage of memory in DG 8 in OCP 4 without gc log?
  • Gossip Router pod doesn't have gc logs set, how to investigate its heap/off heap utilization?

Resolution

Investigating heap/off heap usage Dg 8 OCP 4 - see setting jcmd on pod Alternatives for creating heap dump in a DG 8 even without the JDK .
If there is no gc logs set (GR pod doesn't have gc logs) do the following:

Heap usage can be investigated via jcmd GC.heap_info

./jcmd 1 GC.heap_info
1:
 def new generation   total 72512K, used 35796K [0x0000000715a00000, 0x000000071a8a0000, 0x0000000763c00000)
  eden space 64512K,  53% used [0x0000000715a00000, 0x0000000717bbd6b0, 0x0000000719900000)
  from space 8000K,  15% used [0x0000000719900000, 0x0000000719a37960, 0x000000071a0d0000)
  to   space 8000K,   0% used [0x000000071a0d0000, 0x000000071a0d0000, 0x000000071a8a0000)
 tenured generation   total 161152K, used 0K [0x0000000763c00000, 0x000000076d960000, 0x0000000800000000)
   the space 161152K,   0% used [0x0000000763c00000, 0x0000000763c00000, 0x0000000763c00200, 0x000000076d960000)
 Metaspace       used 2213K, capacity 5319K, committed 5376K, reserved 1056768K
  class space    used 182K, capacity 503K, committed 512K, reserved 1048576K

Off heap usage:

Do ps -aux - the gossip router should be the first (and only) java process:

$ ps -aux
USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
1000650+       1  0.2  0.7 6530408 127232 ?      Ssl  20:01   0:13 java -cp /opt/infinispan/lib/jgroups-4.2.18.Final-redhat-00001.jar org.jgroups.stack.GossipRouter -port 7900 -dump_msgs registration
1000650+     857  0.0  0.0  19336  3608 pts/0    Ss   21:05   0:00 /bin/sh
1000650+    1219  0.0  0.0  19220  3612 pts/1    Ss   21:07   0:00 /bin/sh
1000650+    6249  0.0  0.0  19332  3660 pts/2    Ss   21:20   0:00 /bin/sh
1000650+    8481  0.0  0.0  54780  3944 pts/2    R+   21:25   0:00 ps -aux

Otherwise, jcmd PID VM.infocan be used to show RSS and Virtual memory current allocation and peak:

$ jcmd PID VM.info | grep Resident
Resident Set Size: 540572K (peak: 540572K) (anon: 524688K, file: 15884K, shmem: 0K)
$ jcmd PID VM.info | grep Virtual
...
Virtual Size: 2691180K (peak: 2726072K)

Quantity of Gc operations (full heap cleaning):

./jcmd 1 VM.info | grep "Heap after GC invocations="
{Heap after GC invocations=2 (full 0):
{Heap after GC invocations=3 (full 0):

For metaspace one can do jcmd PID metaspace or jcmd PID VM.info:

VM.info see Process section:

Metaspace:
Usage:
  Non-class:      4.69 MB capacity,     1.97 MB ( 42%) used,     2.71 MB ( 58%) free+waste,     4.19 KB ( <1%) overhead. 
      Class:    492.00 KB capacity,   176.61 KB ( 36%) used,   312.77 KB ( 64%) free+waste,     2.62 KB ( <1%) overhead. 
       Both:      5.17 MB capacity,     2.14 MB ( 42%) used,     3.02 MB ( 58%) free+waste,     6.81 KB ( <1%) overhead. 
Virtual space:
  Non-class space:        8.00 MB reserved,       4.75 MB ( 59%) committed 
      Class space:        1.00 GB reserved,     512.00 KB ( <1%) committed 
             Both:        1.01 GB reserved,       5.25 MB ( <1%) committed 
Chunk freelists:
   Non-Class:  34.00 KB
       Class:  20.00 KB
        Both:  54.00 KB

Root Cause

Gossip Router pod doesn't have gc logs set (DG Operator 8.3.7-), nor one can set JVM flags for it.

about VM.info

VM.info output divides in sections: System, Process, Bios, GC history - and can be used for heap and off heap investigations by taking the delta:

Process Memory:
Virtual Size: 6530408K (peak: 14865328K) <------------
Resident Set Size: 129608K (peak: 129660K) (anon: 102604K, file: 27004K, shmem: 0K) <-----
Swapped out: 0K
C-Heap outstanding allocations: 2093K

Diagnostic Steps

  1. sh terminal connections might not close, one can kill them via kill -9 PID
  2. The memory that appears on Prometheus will be overall memory otilization on the pod - see solution Interpreting Data Access statistics in DG 8 about this information.
  3. For crash investigations see: Troubleshoot options for Data Grid pod crash. For OCP node crash see DG 8 operation in case of OCP nodes crashing
Product(s)
Components
Category
Tags

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.