What does probe checks of JBoss EAP 7 on OpenShift ?

Solution Verified - Updated

Environment

  • Red Hat JBoss Enterprise Application Platform (EAP)
    • 7.x
  • Red hat OpenShift Container Platform (OCP)
    • 4.x

Issue

  • Details of livenessProbe.sh and readinessProbe.sh in JBoss EAP of OpenShift.
  • How to check probes using livenessProbe and readinessProbe in the JBoss EAP container on OpenShift ?
  • How to debug the probes in EAP 7?

Resolution

Basic EAP probe uses the DMR interface which accepts inhttp://localhost:9990/management (for more details Using the HTTP Management API in JBoss EAP 6 and 7) to query server state. It defines tests for server status, boot errors and deployment status.

The livenessProbe.sh and readinessProbe.sh scripts call the python scripts via PROBE_IMPL=probe.eap.dmr.EapProbe as
reference of /opt/eap/bin/probes/probe/eap/dmr.py in the wrapper shells.
They are located in $JBOSS_HOME/bin/probes/runner.py and /opt/eap/bin/probes/probe/eap/dmr.py

The related implementation of /opt/eap/bin/probes/probe/eap/dmr.py is as follows.

class EapProbe(DmrProbe):
    """
    Basic EAP probe which uses the DMR interface to query server state.  It
    defines tests for server status, boot errors and deployment status.
    """

    def __init__(self):
        super(EapProbe, self).__init__(
            [
                ServerStatusTest(),
                BootErrorsTest(),
                DeploymentTest()
            ]
        )
...
class ServerStatusTest(Test):
    """
    Checks the status of the server.
    """
...
class BootErrorsTest(Test):
    """
    Checks the server for boot errors.
    """
...
class DeploymentTest(Test):
    """
    Checks the state of the deployments.
    """
  • In ServerStatusTest class, the script sends the query that has the same meaning as below. In livenessProbe.sh, even if a result other than running is returned, the health check will succeed. In readinessProbe.sh, if a result other than running is returned, the check will fail.
sh-4.2$ curl http://localhost:9990/management --header "Content-Type: application/json" -d '{"operation":"read-attribute", "name": "server-state"}'
{"outcome" : "success", "result" : "running"}
  • In BootErrorsTest class, the script sends as below. Health check fails if a JBoss EAP subsystem failed to start and an error is returned from the query.
sh-4.2$ curl http://localhost:9990/management --header "Content-Type: application/json" -d '{"operation": "read-boot-errors", "address": {"core-service": "management"}}'
{"outcome" : "success", "result" : []}

In DeploymentTest class, the script sends as below. Health check fails if a deployment failed to start and an error is returned from the query.

sh-4.2$ curl http://localhost:9990/management --header "Content-Type: application/json" -d '{"operation": "read-attribute", "address":{"deployment": "*"}, "name": "status"}'
{"outcome" : "success", "result" : [{"address" : [{ "deployment" : "test.war" }], "outcome" : "success", "result" : "OK"}]}

In jboss-eap-7/eap72-openshift or later, in addition to the above checks, the health check script sends an additional query as below. This query always return UP unless the application contains a custom health check based on MicroProfile Health spec, so this health check will always succeed by default.

sh-4.2$ curl http://localhost:9990/management --header "Content-Type: application/json" -d '{"operation": "check", "address": {"subsystem": "microprofile-health-smallrye"}}'
{"outcome" : "success", "result" : {"outcome" : "UP", "checks" : []}}

Root Cause

EAP 7 deployment can be done via Operator or template (EAP 8 only supports Helm Charts as Tech Preview).
The deployment will create deployment/deployment config yaml file, which sets the pod yamls - that will include the probes - which are scripts that call the python functions.
For template deployment on EAP 7.1 uses the following Content from github.com is not included.template definition.

                                "livenessProbe": {
                                    "exec": {
                                        "command": [
                                            "/bin/bash",
                                            "-c",
                                            "/opt/eap/bin/livenessProbe.sh"
                                        ]
                                    },
                                    "initialDelaySeconds": 60
                                },
                                "readinessProbe": {
                                    "exec": {
                                        "command": [
                                            "/bin/bash",
                                            "-c",
                                            "/opt/eap/bin/readinessProbe.sh"
                                        ]
                                    }
                                },

The template set on the deployment/deployment config:

apiVersion: apps.openshift.io/v1
items:
- apiVersion: apps.openshift.io/v1
  kind: DeploymentConfig
  metadata:
    annotations:
      openshift.io/generated-by: OpenShiftNewApp
    creationTimestamp: "2022-12-21T06:56:21Z"
    generation: 1
    labels:
      app: eap74-basic-s2i
      app.kubernetes.io/component: eap74-basic-s2i
      app.kubernetes.io/instance: eap74-basic-s2i
      application: eap-app
      template: eap74-basic-s2i
      xpaas: 7.4.0
...
      spec:
        containers:
        ...
          livenessProbe:
            exec:
              command:
              - /bin/bash
              - -c
              - /opt/eap/bin/livenessProbe.sh
            failureThreshold: 3
            initialDelaySeconds: 60
            periodSeconds: 10
            successThreshold: 1
            timeoutSeconds: 1
          name: eap-app
          ports:
          - containerPort: 8778
            name: jolokia
            protocol: TCP
          - containerPort: 8080
            name: http
            protocol: TCP
          - containerPort: 8888
            name: ping
            protocol: TCP
          readinessProbe:
            exec:
              command:
              - /bin/bash
              - -c
              - /opt/eap/bin/readinessProbe.sh
            failureThreshold: 3
            initialDelaySeconds: 10
            periodSeconds: 10
            successThreshold: 1
            timeoutSeconds: 1
          resources:
            limits:
              memory: 1Gi
          terminationMessagePath: /dev/termination-log
          terminationMessagePolicy: File
        dnsPolicy: ClusterFirst
        restartPolicy: Always
        schedulerName: default-scheduler
        securityContext: {}
        terminationGracePeriodSeconds: 75

Example readiness and liveness

Output typeliveness location/fileliveness location/file
OUTPUT/tmp/liveness-output/tmp/readiness-output
ERROR/tmp/liveness-error/tmp/readiness-error
LOG/tmp/liveness-log/tmp/readiness-log

To see more logs on the livenessProbe.sh:

sh-4.4$ cat /opt/eap/bin/livenessProbe.sh     
...
OUTPUT=/tmp/liveness-output 
ERROR=/tmp/liveness-error
LOG=/tmp/liveness-log

Liveness Probe and Readiness Probe expected behavior vs actual behavior

Failure of the probes vs expected results:

  • Failure of the liveness probe forces Kubelet to send an SIGTERM and then SIGKILL.
  • Failure of the readiness probe forces the pod to be isolated, so not receiving any network connection.

However, the failure of one or both probes at the start-up of EAP 7 will make the pod state not change from Failure (default) to Success, so it will prevent them from coming up. Therefore, although by definition the failure of the readiness probe doesn't cause a restart, the first time it is fired, in case of a failure, it will cause the pod's state remain in failure, which is the default state regardless, and the pod cannot come up. Therefore causing a SIGTERM. This explains problems where the readiness probe causes pod restarts without breaking its responsibility definition: the readiness probe controls the network usage not the live stage of a pod, which is the responsibility of the liveness probe.

Diagnostic Steps

  1. The probe scripts rely in a python commands that ping the Health Check subsystem (if installed).
    To see more logs on the livenessProbe.sh and readinessProbe.sh, enable SCRIPT_DEBUG env variable on the deployment/deploymentconfig, which will located in /tmp/readiness-log.
    Example:
2023-10-27 23:28:19,533 DEBUG [__main__] Starting probe runner with args: Namespace(check=[<Status.READY: 8>], debug=True, logfile='/tmp/readiness-log', loglevel='DEBUG', probes=['probe.eap.dmr.EapProbe', 'probe.eap.dmr.HealthCheckProbe'])
...
2023-10-27 23:28:19,594 INFO [probe.eap.dmr.EapProbe] Executing the following tests: [probe.eap.dmr.ServerStatusTest, probe.eap.dmr.ServerRunningModeTest, probe.eap.dmr.BootErrorsTest, probe.eap.dmr.DeploymentTest]
...
2023-10-27 23:40:31,574 INFO [urllib3.connectionpool] Starting new HTTP connection (1): localhost
2023-10-27 23:40:31,575 DEBUG [urllib3.connectionpool] "POST /management HTTP/1.1" 200 None
2023-10-27 23:40:31,576 DEBUG [probe.eap.dmr.EapProbe] Probe response: <Response [200]>
2023-10-27 23:40:31,576 INFO [probe.eap.dmr.EapProbe] Executing test probe.eap.dmr.ServerStatusTest
2023-10-27 23:40:31,576 DEBUG [probe.eap.dmr.EapProbe] Test input = {
    "outcome": "success",
    "result": "running"
}
  1. Verify the events: oc get events -n namespace
  2. Also verify the kubelet probe failures in the OCP node: journalctl -u kubelet:
$ journalctl -u kubelet | grep probe
Apr 05 21:58:52 ip-10-0-1-7 kubenswrapper[2100]: I0405 21:58:52.854520    2100 prober.go:107] "Probe failed" probeType="Readiness" pod="eap-demo/eap-app-4-p9kmq" podUID=d5419cf7-2eaa-4984-a695-3c25be5c5d14 containerName="eap-app" probeResult=failure output=<
Components

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.