JBossAS process (...) received TERM signal in OpenShift
Environment
- Red hat OpenShift Container Platform (OCP)
- 3.x
- 4.x
- Red Hat JBoss Enterprise Application Platform (EAP)
- 7.x
Issue
JBoss EAP process receives a TERM signal in OpenShift:
*** JBossAS process (1060) received TERM signal ***
INFO [org.jboss.as.server] (Thread-2) WFLYSRV0236: Suspending server with no timeout.
Resolution
Each scenario has a different solution. For example, in case the usage is reaching the limit, and is being OufOfMemory KIlled, increase it to a higher value in the deployment-uat.yaml file will avoid this to happen. See details on the Root Cause section.
Root Cause
The pod is communicating that is being killed either by kubelet or cgroups for example.
Table: Possible root causes and next steps:
| Scenario | Root Cause | Recommendation |
|---|---|---|
| The Pod doesn't answer to the probes, Kubelet will be restart | so one possible cause is the lack of cpu/kernel time so can be | allocating more cpu on the container itself. |
| The deployment/application taking more time to be ready | Lack of cpu or probe is too short - lack of cpu doesn't cause OOME/OOMEKills but timeouts | Increase the timeouts of the probe |
| If the JVM does not have enough memory (given the JVM will calculate the heap size from the container) | Lack of memory this could cause OOME exceptions | Increase the size of the container, therefore increasing the containers heap |
| If the JVM native space takes more space than what is allocated (ubi8 images have a 50% ratio of the container for off-heap) | so then a cgroups Kill could be the cause of sign TERMS | Increase the size of the container, therefore increasing the containers off-heap space |
Details on increasing heap memory on EAP container see How to change JVM memory options using Openshift images with Java? and for details on probe What does probe checks of JBoss EAP 7 on OpenShift ?. Finally, note that OOMEKill by the kernel is different than the cgroups OOMKill and both are different than OOM Exception. TERM is a nice request to stop - that's the default for kill if not specified.
SIGTERM SIGKILL
When the probe fails then a SIGTERM is sent and then after a period a SIGKILL is sent. The default delay between the signals is 30 seconds. EAP will report the sigterm as:
05:35:06,612 INFO [org.jboss.as.server] (Management Triggered Shutdown) WFLYSRV0241: Shutting down in response to management operation 'shutdown'
OOME Vs OOMEKill
- OOME: An exception that is handled inside the JVM. It is an exception and shouldn't necessarily cause the JVM to finish, however, most containers have
ExitOnOutOfMemoryErrorflag enabled. - OOME Kill: A signal that can come from the OCP node or from the JVM itself via cgroups. The former is caused by a systemic lack of resources on the OCP node, whereas the latter is caused by heap or native allocation above its set boundaries.
Diagnostic Steps
Follow these steps:
-
Verify the amount of memory set in the
podlooking atdeployment-uat.yamllimits: memory: 1Gi -
Verify with command
top, as inocp adm topthe amount of memory in usage:#`ocp adm top` NAME CPU(cores) MEMORY(bytes) nps-kieserver-55-k9942 10m 1001Mi
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.