How to define memory and cpu resource via limits/requests on DG 8 Operator
Environment
- Red hat OpenShift Container Platform (OCP)
- 4.x
- Red Hat Data Grid (RHDG)
- 8.x
- Operator
Issue
- How to define memory and cpu resource via limits/requests on DG 8 Operator
- Can a high doubling the value avoid an OOME-killer?
Resolution
On the Infinispan Cr define the memory/cpu as spec.container: cpu and spec.container: memory as below:
apiVersion: infinispan.org/v1
kind: Infinispan
metadata:
name: ${CLUSTER_NAME}
namespace: ${CLUSTER_NAMESPACE}
labels:
type: middleware
spec:
container:
cpu: 2:1
memory: 1Gi:2Gi
$ oc get statefulset dg-cluster-cpu -o json | grep cpu
"name": "dg-cluster-cpu",
"name": "dg-cluster-cpu",
"app.kubernetes.io/created-by": "dg-cluster-cpu",
"clusterName": "dg-cluster-cpu",
"infinispan_cr": "dg-cluster-cpu"
"app.kubernetes.io/created-by": "dg-cluster-cpu",
"clusterName": "dg-cluster-cpu",
"infinispan_cr": "dg-cluster-cpu"
"clusterName": "dg-cluster-cpu",
"infinispan_cr": "dg-cluster-cpu"
"cpu": "1",
"cpu": "2",
This will be reflected the pod as below (below is the pod yaml):
containers:
- resources:
limits:
cpu: '2'
memory: 2Gi
requests:
cpu: '1'
memory: 1Gi
Setting the above is directly related to the QoS level of the pod given that different levels set the QoS level toBurstable and same levels set QoS pod level for Guaranteed, see How to set QoS on DG 8 pods in OCP 4 on this matter.
Java will only take the container limits as reference for thread calculation, as explained in Differences between Data Grid 8.3 vs Data Grid 8.2.
Root Cause
Q1. Why have different values of request vs limits
A1. Sometimes, depending on the kernel, cpu quotas can impose cpu throttling. Having different values can be useful - also for setting QoS levels on the pods.
Q2. Why the DG 8 process takes spec.container.cpu limits in consideration instead of requests?
A2. The JVM. The JVM is nonelastic (some parameters cannot be changed on the fly) it means it is started with a certain values (cpu is a core one) and cannot change on the fly. Therefore no JVM process will use spec.container.cpu.limit as upper boundary because it is the cpu limit and that cannot change. The JVM starts, detects the memory settings/number of cpus and start. It cannot change the number of cpus at runtime. The only way to change would be to restart the JVM process itself.
Q3. Can a high/ or doubling the value spec.container.memory avoid an OOME-killer?
A3. Doubling the resources (memory defined as above) for the pod deployment (via operator cr, which is applied on the statefulset) if the OCP host node still has the same amount of memory might not solve OOME-Killer.
That's because the impact of this change can make kubelet forcibly killing other processes in order to ensure that it has enough to continue operating. So one has to investigate the actual OCP node and double its resources to avoid OOME-Killer scenarios.
In other words, the bottleneck likely rely on the OCP node itself, and the OCP might be stranded in resources by doubling memory resource request/limits may not solve this problem.
The limit/request values define what kublet is allowed to allocate to a pod deployment - requests == kubelet guarantees it will have X value resourcing available before that container can be scheduled, and Limits == maximum value that the pod can use during bursts or peak processing. But for DG 8 may take the spec.container.cpu limits (not requests) for calculation - see solution
To investigate this, see the node top output and verify what is defined in the kubeletconfig ( via oc get kubeletconfig) - since this is the specification user defines to kubelet that allocates minimum reserved value for cpu and memory for itself, before it starts to allocate to other pods.
Java will only take the container limits as reference for thread calculation, as explained in Differences between Data Grid 8.3 vs Data Grid 8.2.
Q4. How to set the QoS on the DG 8 pod?
A4. See solution How to set QoS on DG 8 pods in OCP 4.
Q5. Does java takes into consideration container limits or requests? What if I left without it?
A4. Java will only take the container limits as reference for thread calculation, as explained in Differences between Data Grid 8.3 vs Data Grid 8.2. The recommendation is to have container limits and have QoS Guaranteed - for critical deployments
Diagnostic Steps
$top: Investigate the OCP node top usage for actual memory/cpu usageoc get kubeletconfig: kubelet that allocates minimum reserved value for cpu and memory- To see memory/cpu request/limit see the deployment:
oc get statefulset dg-cluster-cpu -o json - See the inspect with pod details for QoS at the status section.
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.