Guidelines for customizing JVM flags in Data Grid 8 images
Environment
- OpenJDK 11, 17
- Red hat OpenShift Container Platform (OCP)
- 4.x
- Red Hat Data Grid (RHDG)
- 8.x
Issue
How to customize JVM flags on Data Grid 8 in OCP 4?
Resolution
Please calculate the required resources (cpu, memory, storage) accordingly.
Do not deploy DG production pods without adequate values and when tuning, make sure you know the implications of the changes - for heap and off-heap.
Java is container aware - so it already sets MaxRAM as the memory size of the container (so total size == pod; for pods with only one container), as explained in Setting MaxRAM on DG 8 pod. Highlight on fact java is container aware and non-elastic. Percentage of RAM is currently what the image will use to set the heap and off-heap.
Note that:
/proc/cgroupstells nothing about the number of CPUs: it brings information about mounted cgroup controllers, hierarchy id, etc. user cannot infer anything about the number of cpus from that.- lscpu is not container aware
- For cgroups investigation use jcmd $PID VM.info.
Core points
- First: the JVM is container aware
- Second: for main DG purposes/behavior the JVM is inelastic; so JVM will use resources.limits for cpu calculations (GC threads, blocking threads, non-blocking threads).
- Third: Allocate reasonable cpu resources for the container. Do not allocate a minimal value for the cpu resources.
- Forth: Prefer runtime configuration rather than build configuration, i.e. avoid hard-coding values such as heap/off-heap sizes/jvm flags/jvm properties via container for example.
- Fifth: Setting cpu resources requests == limits provides the pod with Garanteed Quality of Service (QoS) and it is usually more recommended in production critical deployments - reference.
- Final: Select the adequate deployment method in OCP 4: either Operator or Helm charts.
JVM options
To set jvm arguments set spec.container.extraJvmOpts - example:
spec:
container:
cpu: '2'
extraJvmOpts: '-Xlog:gc*=info:file=/tmp/gc.log:time,level,tags,uptimemillis:filecount=10,filesize=1m'
Xmx (and Xms)
The default JAVA_MAX_MEM_RATIO is set from the DG 8 image (not the Operator), where it sets 50% for heap and 50% for non heap.
This comes from the openjdk images - ubi7/ubi8 images, setting 50% as baseline (ubi8 will have 80% as baseline).
For overwriting this behavior use Xmx, which will set the max heap - however, do not set this value for the whole pod size/container, off-heap space must be left.
In fact do not set Xmx for more than 80% of the heap (so less than off-heap space). Make off-heap usage will cause OOME and OOME cgroups Killer. Again setting too small off-heap will result in crashes.
Setting Xms can help on performance in some cases - where the JVM will ask from the kernel the memory at start-up the complete Xmx value (this is not guaranteed) - so some performance improvement can be gained depending on variable factors (gc collector, kernel, number of cpus, other processes in pod).
Reduction memory utilization (returning to the kernel)
The user can set -Content from bugs.openjdk.org is not included.ShrinkHeapInSteps, can be user to reduce more aggressively the heap occupancy to the -XX:MaxHeapFreeRatio: meaning more aggressively than in stages. Without this flag, the default implementation requires several garbage collection cycles for this reduction (default == in steps).
### Default behavior == return in steps
$ java -XX:+PrintFlagsFinal -version | grep ShrinkHeapInSteps
bool ShrinkHeapInSteps = true {product} {default}
openjdk version "11.0.12" 2021-07-20 LTS
OpenJDK Runtime Environment 18.9 (build 11.0.12+7-LTS)
OpenJDK 64-Bit Server VM 18.9 (build 11.0.12+7-LTS, mixed mode, sharing)
### Not the behavior == doesn't return in steps == aggressively return
$ java -XX:-ShrinkHeapInSteps -XX:+PrintFlagsFinal -version | grep ShrinkHeapInSteps
bool ShrinkHeapInSteps = false {product} {command line}
openjdk version "11.0.12" 2021-07-20 LTS
OpenJDK Runtime Environment 18.9 (build 11.0.12+7-LTS)
OpenJDK 64-Bit Server VM 18.9 (build 11.0.12+7-LTS, mixed mode, sharing)
Off-heap usage
Even though the cache entries might not be set for off-heap, the JVM will allocate and use off-heap memory
Off-heap usage includes, but it is not limited to, all kind of non-heap allocation - example:
- metaspace
- threads
- GC collector (the collector itself)
- modules loading (JDK11 and JDK 17).
- shared libraries
Some of those are nominally and stable, others are directly proportional to the load.
And jcmd native tracking can be used to track those, as well as VM.info. Therefore, not just the off-heap caching will be occpy the off-heap and this must be accounted for - to avoid (native) OOME.
Total RAM usage:
RAM is directly proportional to number of owners - and defined as below:
| Aspect | Usage |
|---|---|
| Cluster-wide | number of owners * total cache size. |
| Per node | (number of owners / number of nodes) * total cache size |
So the more owners, the more data used, however, one will lose data when a node goes down - and depending on the use case that can be acceptable if shared store or another way to reload the cache.
If not, this can be deleted and lost completely. Also when not setting MaxRAM, the value it will be printed will be 137gb - see in Diagnostic Steps.
Other suggestions:
- Do not change the default 50% ratio between heap and off-heap unless a very detailed benchmark process was done knowing how much heap and off-heap usage was set
- Complementary to the above, do not set Xmx for more than 80% of the heap (so less than off-heap space) - leave space for the off-heap (even if the cache is not set for off-heap).
- Do not set off-heap for 80%, even if all the caches are off-heap, because of state transfers heap usage, which need a big chunk of heap. After the actual cache data, state transfers are the 2nd biggest memory user in DG. (in short bursts)
- Setting less than 1 cpu is not recommended. In fact you should not deploy anything in production with less than 2/3 cpus. Lack of cpu will cause timeouts on execution of cli commands. If you have more resources, here is the time to use them.
- Using 2+ cpus allows a much better usage of multi-thread collectors such as ParallelGC collector. For instance, for 1 CPU SerialGC can be the adequate option, given using ParallelGC would add extra overhead to manage the extra threads with the same CPU power.
- Setting QoS by setting the same values of
limit==requeston the pods - In some cases generational collectors will have a better performance than non-generational
- Avoid deprecated type service Cache
- Avoid deprecated flags, like using JDK 17 with UseParallelOldGC, it will return:
Unrecognized VM option 'UseParallelOldGC'or--illegal-access=debug.
Root Cause
To customize the jvm flags on Data Grid 8 Operator use spec.container.extraJvmOpts - example:
- apiVersion: infinispan.org/v1
kind: Infinispan
metadata:
name: ${CLUSTER_NAME}
namespace: ${CLUSTER_NAMESPACE}
annotations:
infinispan.org/monitoring: 'false'
labels:
type: middleware
prometheus_domain: ${CLUSTER_NAME}
spec:
configMapName: "${CLUSTER_NAME}-custom-config"
container:
cpu: '2'
extraJvmOpts: '-Xlog:gc*=info:file=/tmp/gc.log:time,level,tags,uptimemillis:filecount=10,filesize=1m
-Dcom.redhat.fips=false -Dio.netty.allocator.type=unpooled'
routerExtraJvmOpts: '-XX:MaxDirectMemorySize=4M -Xlog:gc*=info:file=/tmp/gc.log:time,level,tags,uptimemillis:filecount=10,filesize=1m'
Default RAM usage
By default the OpenJDK image sets the heap/off-heap ratio as 50/50 this means half of the container memory will be heap and half will be off-heap:
ENV \
ISPN_HOME="/opt/infinispan" \
JAVA_GC_MAX_METASPACE_SIZE="96m" \
JAVA_GC_METASPACE_SIZE="32m" \
JAVA_MAX_MEM_RATIO="50" \ <----------------------------------------- 50/50
JBOSS_IMAGE_NAME="datagrid/datagrid-8" \
JBOSS_IMAGE_VERSION="1.4"
Above is line 73 on This content is not included.Red Hat Data Grid for OpenShift - Dockerfile.
The JAVA_MAX_MEM_RATIO sets the Xmx, which has precedence over MaxRAMPercentage for example.
Depending on the OpenJDK version, JAVA_MAX_MEM_RATIO is implemented setting Xmx directly, and later versions implement MaxRAMPercentage.
This means, if the image is using the MaxRAMPercentage implementation user can overwrite with MaxRAMPercentage.
If not then the image would set Xmx and any setting with MaxRAMPercentage is futile, because Xmx has precedence over MaxRAMPercentage.
Notes
-
In terms of RSS in containers, using Xmx == Xms and XX:+AlwaysPretouch means the container will always run with an RSS >= NGb (N is the value in Gb for heap) no matter what, in particular it won't give any memory back to the OS even if it doesn't use it. It does that irrespective of the actual memory needs of the application
-
MaxRAM should not be used and instead setting a container limit via the deployment config and setting XX:MaxRAMPercentage=80.0 provides more elastic configuration, given it couples the deploymentconfig (runtime configuration) with the heap usage/size.
-
Using MaxDirectMemorySize translates to the
sun.nio.MaxDirectMemorySizeproperty and if not specified uses theXmxsetting for the upper bound. When using this flag, make sure to leave room for: metaspace+thread stack+ code cache+ direct buffers+other JVM/GC overhead. Otherwise the container will crash given the upper boundary - meaning the pod may die with wrong configurations.
Diagnostic Steps
| Issue | Solution |
|---|---|
| Customizing configuration in DG 8 in OCP | Using custom configuration in DG 8 via Operator |
| Differences of JDK 17 for DG 8.4 deployment (DG 8.4 images brings JDK 17 by default) | Major differences between OpenJDK 17 and OpenJDK 11 |
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.