How do I run top and ps commands when in a container without standard Linux tools
Environment
- Red Hat OpenShift Container Platform (OCP)
- 4.x
- 3.x
- Middleware containers based on ubi8
- "Distroless" containers
Issue
- Container images such as
jboss-webserver-5/jws56-openjdk11-openshift-rhel8andjboss-eap-7/eap74-openjdk11-openshift-rhel8, which are based onubi8, do not haveprocps-ngpackage as follows:
[lab@master ~]$ oc rsh <mw-pod-name>
sh-4.4$ top -b -n 1 -H -p <java-pid>
sh: top: command not found
sh-4.4$ ps
sh: ps: command not found
- CPU metrics such as
container_cpu_user_seconds_totalandcontainer_cpu_system_seconds_totalas cAdvisor metrics are collected by OpenShift Monitoring, but these metrics are in units of containers, and no cpu-usage metrics per thread in cAdvisor. - Scraping interval of cpu utilization that can be referred in OpenShift monitoring stack is
30seconds, but I need to check cpu utilization at finer intervals to identify burst load for a couple of seconds - How do I identify high CPU utilization by Java threads on OCP?
Resolution
There are different procedures depending on OCP versions.
OCP 4.10 or later
The Content from kubernetes.io is not included.Ephemeral Containers is introduced in OCP 4.10 based on Kubernetes 1.23+ as a beta future. You can attach debug containers without restarting the target pod, and debug container runs in the target pod's namespace. The following example attaches the toolbox container including ps and top commands:
- Get debug target pod name and container name inside pod:
$ oc get pod
NAME READY STATUS RESTARTS AGE
helloworld-1-42xpj 1/1 Running 0 15m
$ oc get pod helloworld-1-42xpj -o jsonpath="{.spec.containers[*].name}"
helloworld
- Attach Ephemeral Container by
kubectl debug -it <target-pod-name> --image=<debug-container-image> --target=<target-container-name>. The oc command does not support launching Ephemeral Container, so this example useskubectlinstead ofoc:
$ kubectl debug -it helloworld-1-42xpj --image=registry.redhat.io/rhel8/support-tools:latest --target=helloworld
Targeting container "helloworld". If you don't see processes from this container it may be
because the container runtime doesn't support this feature.
Defaulting debug container name to debugger-bq4kx.
If you don't see a command prompt, try pressing enter.
[root@helloworld-1-42xpj /]#
psandtopcommands in toolbox container can be executed in the target pod's namespace as follows:
[root@helloworld-1-42xpj /]# ps aux
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
1000710+ 1 0.1 0.0 15436 6540 ? Ss 04:54 0:00 /bin/bash /opt/eap/bin/openshift-launch.sh
1000710+ 316 0.0 0.0 12184 3232 ? S 04:55 0:00 /bin/sh /opt/eap/bin/standalone.sh -c standalone-openshift.xml -bmanagement 0.0.0.0 -Djboss.server.data.dir=/opt/eap/standalon
1000710+ 644 17.2 0.3 2459952 397220 ? Sl 04:55 0:18 /usr/lib/jvm/java-11/bin/java -D[Standalone] -javaagent:/opt/eap/jboss-modules.jar -server -Xlog:gc*:file=/opt/eap/standalone/
root 901 0.0 0.0 19220 3652 pts/0 Ss 04:55 0:00 /usr/bin/bash
root 985 0.0 0.0 54780 3888 pts/0 R+ 04:56 0:00 ps aux
[root@helloworld-1-42xpj /]# top -b -n 1 -H -p 644 | head -n 15
top - 04:57:27 up 4 days, 23:51, 0 users, load average: 1.04, 1.37, 1.71
Threads: 86 total, 0 running, 86 sleeping, 0 stopped, 0 zombie
%Cpu(s): 15.4 us, 6.0 sy, 0.0 ni, 77.5 id, 0.0 wa, 0.0 hi, 1.1 si, 0.0 st
MiB Mem : 128280.3 total, 68429.5 free, 28357.4 used, 31493.4 buff/cache
MiB Swap: 0.0 total, 0.0 free, 0.0 used. 98877.1 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
644 1000710+ 20 0 2459952 414656 30044 S 0.0 0.3 0:00.00 java
645 1000710+ 20 0 2459952 414656 30044 S 0.0 0.3 0:00.82 java
646 1000710+ 20 0 2459952 414656 30044 S 0.0 0.3 0:00.48 ParGC Thread#0
647 1000710+ 20 0 2459952 414656 30044 S 0.0 0.3 0:00.28 VM Thread
648 1000710+ 20 0 2459952 414656 30044 S 0.0 0.3 0:00.00 Reference Handl
649 1000710+ 20 0 2459952 414656 30044 S 0.0 0.3 0:00.00 Finalizer
650 1000710+ 20 0 2459952 414656 30044 S 0.0 0.3 0:00.00 Signal Dispatch
651 1000710+ 20 0 2459952 414656 30044 S 0.0 0.3 0:00.00 Service Thread
// run java thread dump by kill -3 <pid> and top to target java process every 5 seconds
[root@helloworld-1-42xpj /]# for i in {1..60}; do kill -3 644; top -b -n 1 -H -p 644 >> /tmp/top_644.out; sleep 5; done
[root@helloworld-1-42xpj /]#
- Open another console then copy
/tmp/top_644.outfrom ephemeral container to local system:
$ oc cp helloworld-1-42xpj:tmp/top_644.out top_644.out -c debugger-bq4kx
$ ls
top_644.out
- Run
exitcommand on the console where you rankubectl debug. The ephemeral container will automatically terminate whenexitcommand is called:
[root@helloworld-1-42xpj /]# exit
OCP 4.9 or earlier
Run ps and top commands via host OS shell by oc debug node/<node_name>.
-
Check the worker node where the pod is running and target container-id:
$ oc get pod -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES helloworld-1-bqbnc 1/1 Running 0 118m 10.131.0.62 worker-0.nagetsum410.lab.upshift.rdu2.redhat.com <none> <none> $ oc get pod -o yaml helloworld-1-bqbnc -o jsonpath="{.status.containerStatuses[*].containerID}" cri-o://7ee3a487742764681ae3cdbed1ff5b494ad921734b088f468ac7c1bb896e6fbd -
Start a shell on the worker node where the target pod is running by
oc debug node/<node-name>:
$ oc debug node/worker-0.nagetsum410.lab.upshift.rdu2.redhat.com
Starting pod/worker-0nagetsum410labupshiftrdu2redhatcom-debug ...
To use host binaries, run `chroot /host`
Pod IP: 10.0.94.25
If you don't see a command prompt, try pressing enter.
sh-4.4# chroot /host
- Get a list of PIDs running in the container by
runc ps <container-id>. We can see the PID of the target java process is4040570in the host:
sh-4.4# runc ps 7ee3a487742764681ae3cdbed1ff5b494ad921734b088f468ac7c1bb896e6fbd
UID PID PPID C STIME TTY TIME CMD
1000660+ 4039879 4039867 0 08:11 ? 00:00:00 /bin/bash /opt/eap/bin/openshift-launch.sh
1000660+ 4040220 4039879 0 08:11 ? 00:00:00 /bin/sh /opt/eap/bin/standalone.sh -c standalone-openshift.xml ...
1000660+ 4040570 4040220 0 08:11 ? 00:01:08 /usr/lib/jvm/java-11/bin/java -D[Standalone] -javaagent:/opt/eap/jboss-modules.jar -server ...
psfor target java process that is running in the Pod can be executed from the host OS's shell:
// Check VSZ and RSS of java process launched in the Pod
sh-4.4# ps aux | grep 4040570
1000660+ 4040570 0.8 5.5 2452668 450956 ? Sl 08:11 1:11 /usr/lib/jvm/java-11/bin/java -D[Standalone] -javaagent:/opt/eap/jboss-modules.jar -server
...
topand java thread dump for target java process can be executed from the shell outside Pod:
// run java thread dump by kill -3 <pid> and top command every 5 seconds
$ oc debug node/worker-0.nagetsum410.lab.upshift.rdu2.redhat.com -- bash -c 'for i in {1..60}; do kill -3 4040570; top -b -n 1 -H -p 4040570 -w 512; sleep 5; done' >> top_4040570.out
OCP 3.11 with Docker
This procedure requires ssh login privileges to the worker node where the debug target pod is running:
- Check the worker node where the pod is running:
[lab@master ~]$ oc get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE
helloworld-2-fnvmt 1/1 Running 0 50m 10.131.1.14 worker1.openshift.rhev.local <none>
- Log-in to the target worker node via ssh:
[root@master ~]# ssh root@worker1.openshift.rhev.local
- Identify the host PID of the java process in the pod by
docker psanddocker top <container-name>:
[root@worker1 ~]# docker ps | grep helloworld-2-fnvmt | grep -v POD // helloworld-2-fnvmt is pod name to debug
763e5cd389b5 docker-registry.default.svc:5000/test-app/helloworld@sha256:fedb4eb02b3c25537baccad1abe6d211a2bd180701fd1c04156acc4ef972e328 "/bin/sh -c
$JBOSS..." 56 minutes ago Up 56 minutes k8s_helloworld_helloworld-2-fnvmt_test-app_0e51cfb9-d4be-11ec-a51a-001a4a160779_0
[root@worker1 ~]# docker top 763e5cd389b5 // 763e5cd389b5 is container-id to debug
UID PID PPID C STIME TTY TIME CMD
1000080+ 45771 45751 0 11:15 ? 00:00:00 /bin/bash /opt/eap/bin/openshift-launch.sh
1000080+ 46291 45771 0 11:15 ? 00:00:00 /bin/sh
/opt/eap/bin/standalone.sh -c standalone-openshift.xml -bmanagement 0.0.0.0 -Djboss.server.data.dir=/opt/eap/standalone/data -Dwildfly.statistics-enabled=true -b 10.131.1.14 -bprivate 10.131.1.14 -Djboss.node.name=helloworld-2-fnvmt -Djboss.messaging.host=10.131.1.14 -Djboss.messaging.cluster.password=AENFAqCT
1000080+ 46694 46291 1 11:15 ? 00:01:02 /usr/lib/jvm/java-11/bin/java -D[Standalone] -server ...
psandtopcan run to the java process that is running in the Pod from the host OS's shell:
// Check VSZ and RSS of java process running in the Pod
[root@worker1 ~]# ps aux | grep 46694
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
1000080+ 46694 1.7 1.6 1634380 415864 ? Sl 11:15 1:06 /usr/lib/jvm/java-11/bin/java -D[Standalone] -server ...
// Check CPU utilization by threads
[root@worker1 ~]# top -b -n 1 -H -p 46694
...
// run java thread dump by kill -3 <pid> and top command every 5 seconds
[root@worker1 ~]# for i in {1..60}; do kill -3 46694; top -b -n 1 -H -p 46694 >> /tmp/top_46694.out; sleep 5; done
Root Cause
- ubi8 based MW containers do not have
procps-ngpackage to reduce attach surface.
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.