Cgroups v2 in OpenJDK container in Openshift 4
Environment
- Red Hat Build of OpenJDK
- 8
- 11
- 17
- 21
- Red hat OpenShift Container Platform (RHOCP)
- 4.13+
Issue
- Cgroups v2 in OpenJDK container in Openshift 4
- Cgroups v2 in Red Hat containers
- OOMKill in OCP 4.14, although it worked on OCP 4.13-
Resolution
Solution:
- Since OpenJDK 8u372+ version detects cgroups v2.
- Since OpenJDK 11.0.16+ version version detects cgroups v2.
- Any OpenJDK 17 (or later release, like OpenJDK 21).
Failure to detect cgv2 will result in the container settings/boundaries not being detected and the host being used as boundaries, which will eventually lead to OOMkill by the cgroups container memory size.
OpenJDK container current detects the cgroups via upstreams OpenJDK native code (C/Cpp) code not via scripts. That differs from JBoss EAP images 7, which rely on scripts for detection.
Meaning, other products such as JBoss EAP and AMQ, do not necessarily rely on OpenJDK for cgroups v2 detection and might rely on their scripts for detection and customization. For JBoss EAP 7 details in cgroups version see EAP 7 images cgroups version. For detection cgv2, see Verifying Cgroup v2 Support in OpenJDK Images.
Root Cause
The cgroup version is a kernel feature, i.e. it is inherited from on the node (host system) the container (or the application) is running on. In containerized applications, the container inherits - not impose it and the container itself cannot change the cgroup version. All it does is detect it. If OpenJDK isn't at level 8u372 or higher and the host system is cgroups v2, it would fail to detect any settings. i.e. the cgroups provider wouldn't be cgroupv2 or cgroupv1 presenting the following output:
Operating System Metrics:
No metrics available for this platform
So if you are seeing cgroupv1 for some Java application, it means either:
a- The node the container is running on cgroups v1
b- or the Node OCP is set for cgroups v2, but the version cgv2 is not being detected
Cgroups v1 vs v2
Cgroups v2 changed the hierarch from cgropus v1. For instance cpu_cpuset_cpus is removed in it's place are the cpuset family of values. So instead of having a cpu controller group like in cgroupsv1, they made a separate controller group called cpuset to accomplish the same task. Reference Content from www.kernel.org is not included.here.
Meaning cgroupsv2 has more modular controllers in this case cgroupsv2 has cgroup, cpu, cpuset, io, irq, memory, and misc controllers now and pids etc. Cgroupsv2 now has a modular hierarcy like so:
- cgroupsv2
- modules
- module values
So for cpu_cpuset_cpus, which was one file in cgroupsv1, it's currently cpuset module with the various options in that module.
There is also changes on the reporting of the data itself, see the solution Java's memory consumption inside a Openshift 4 container for more details.
Diagnostic Steps
- For detection cgv2, see Verifying Cgroup v2 Support in OpenJDK Images.
- Run Java with
-Xlog:os+container=traceinstead of -XshowSettings:system for more details. - For OCP Node, the following output will help:
| Command | Cgroups V1 Output | Cgroups V2 Output |
|---|---|---|
| grep cgroup /proc/filesystems | nodev cgroup | nodev cgroup, nodev cgroup2 |
| $ stat -fc %T /sys/fs/cgroup/ | tmpfs | cgroup2fs |
Example:
### CGV2
$ stat -fc %T /sys/fs/cgroup/
cgroup2fs
### CGV1
$ stat -fc %T /sys/fs/cgroup/
tmpfs
### CGv2:
grep cgroup /proc/filesystems
nodev cgroup
nodev cgroup2
### CGv1:
grep cgroup /proc/filesystems
nodev cgroup
- The VM.info will bring the following:
container_type: cgroupv2 <-------------- cgv2
Cgroups details:
| Statistics | Meaning |
|---|---|
| /sys/fs/cgroup/memory.current | Current usage |
| /sys/fs/cgroup/memory.max | Max usage |
| /sys/fs/cgroup/cgroup.stat | nr_descendants and nr_dying_descendants |
| /sys/fs/cgroup/memory.stat | Memory stats include allocation |
| /sys/fs/cgroup/cpuset.cpus.effective | Set of CPU's a task can be run |
| /sys/fs/cgroup/cpu.max | Number of cpus being used |
Example
$ cat /sys/fs/cgroup/memory.stat
anon 37994496 <--------------
…
sh-4.4$ cat /sys/fs/cgroup/cgroup.stat
nr_descendants 0
nr_dying_descendants 0
...
sh-4.4$ cat /sys/fs/cgroup/memory.current
40001536
sh-4.4$ cat /sys/fs/cgroup/memory.max <--- provides the max container size limit in bytes
2147483648
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.