JBossAS process (...) received TERM signal in OpenShift

Solution Verified - Updated

Environment

  • Red hat OpenShift Container Platform (OCP)
    • 3.x
    • 4.x
  • Red Hat JBoss Enterprise Application Platform (EAP)
    • 7.x

Issue

JBoss EAP process receives a TERM signal in OpenShift:

*** JBossAS process (1060) received TERM signal ***
INFO  [org.jboss.as.server] (Thread-2) WFLYSRV0236: Suspending server with no timeout.

Resolution

Each scenario has a different solution. For example, in case the usage is reaching the limit, and is being OufOfMemory KIlled, increase it to a higher value in the deployment-uat.yaml file will avoid this to happen. See details on the Root Cause section.

Root Cause

The pod is communicating that is being killed either by kubelet or cgroups for example.
Table: Possible root causes and next steps:

ScenarioRoot CauseRecommendation
The Pod doesn't answer to the probes, Kubelet will be restartso one possible cause is the lack of cpu/kernel time so can beallocating more cpu on the container itself.
The deployment/application taking more time to be readyLack of cpu or probe is too short - lack of cpu doesn't cause OOME/OOMEKills but timeoutsIncrease the timeouts of the probe
If the JVM does not have enough memory (given the JVM will calculate the heap size from the container)Lack of memory this could cause OOME exceptionsIncrease the size of the container, therefore increasing the containers heap
If the JVM native space takes more space than what is allocated (ubi8 images have a 50% ratio of the container for off-heap)so then a cgroups Kill could be the cause of sign TERMSIncrease the size of the container, therefore increasing the containers off-heap space

Details on increasing heap memory on EAP container see How to change JVM memory options using Openshift images with Java? and for details on probe What does probe checks of JBoss EAP 7 on OpenShift ?. Finally, note that OOMEKill by the kernel is different than the cgroups OOMKill and both are different than OOM Exception. TERM is a nice request to stop - that's the default for kill if not specified.

SIGTERM SIGKILL

When the probe fails then a SIGTERM is sent and then after a period a SIGKILL is sent. The default delay between the signals is 30 seconds. EAP will report the sigterm as:

05:35:06,612 INFO  [org.jboss.as.server] (Management Triggered Shutdown) WFLYSRV0241: Shutting down in response to management operation 'shutdown'

OOME Vs OOMEKill

  • OOME: An exception that is handled inside the JVM. It is an exception and shouldn't necessarily cause the JVM to finish, however, most containers have ExitOnOutOfMemoryError flag enabled.
  • OOME Kill: A signal that can come from the OCP node or from the JVM itself via cgroups. The former is caused by a systemic lack of resources on the OCP node, whereas the latter is caused by heap or native allocation above its set boundaries.

Diagnostic Steps

Follow these steps:

  1. Verify the amount of memory set in the pod looking at deployment-uat.yaml

    limits: 
        memory: 1Gi
    
  2. Verify with command top, as in ocp adm top the amount of memory in usage:

    #`ocp adm top` 
    NAME                          CPU(cores)          MEMORY(bytes)
    nps-kieserver-55-k9942        10m                 1001Mi
    
Components
Category

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.