What does probe checks of JBoss EAP 7 on OpenShift ?
Environment
- Red Hat JBoss Enterprise Application Platform (EAP)
- 7.x
- Red hat OpenShift Container Platform (OCP)
- 4.x
Issue
- Details of
livenessProbe.shandreadinessProbe.shinJBoss EAPofOpenShift. - How to check probes using
livenessProbeandreadinessProbein theJBoss EAPcontainer onOpenShift? - How to debug the probes in EAP 7?
Resolution
Basic EAP probe uses the DMR interface which accepts inhttp://localhost:9990/management (for more details Using the HTTP Management API in JBoss EAP 6 and 7) to query server state. It defines tests for server status, boot errors and deployment status.
The livenessProbe.sh and readinessProbe.sh scripts call the python scripts via PROBE_IMPL=probe.eap.dmr.EapProbe as
reference of /opt/eap/bin/probes/probe/eap/dmr.py in the wrapper shells.
They are located in $JBOSS_HOME/bin/probes/runner.py and /opt/eap/bin/probes/probe/eap/dmr.py
The related implementation of /opt/eap/bin/probes/probe/eap/dmr.py is as follows.
class EapProbe(DmrProbe):
"""
Basic EAP probe which uses the DMR interface to query server state. It
defines tests for server status, boot errors and deployment status.
"""
def __init__(self):
super(EapProbe, self).__init__(
[
ServerStatusTest(),
BootErrorsTest(),
DeploymentTest()
]
)
...
class ServerStatusTest(Test):
"""
Checks the status of the server.
"""
...
class BootErrorsTest(Test):
"""
Checks the server for boot errors.
"""
...
class DeploymentTest(Test):
"""
Checks the state of the deployments.
"""
- In
ServerStatusTestclass, the script sends the query that has the same meaning as below. InlivenessProbe.sh, even if a result other thanrunningis returned, the health check will succeed. InreadinessProbe.sh, if a result other thanrunningis returned, the check will fail.
sh-4.2$ curl http://localhost:9990/management --header "Content-Type: application/json" -d '{"operation":"read-attribute", "name": "server-state"}'
{"outcome" : "success", "result" : "running"}
- In
BootErrorsTestclass, the script sends as below. Health check fails if a JBoss EAP subsystem failed to start and an error is returned from the query.
sh-4.2$ curl http://localhost:9990/management --header "Content-Type: application/json" -d '{"operation": "read-boot-errors", "address": {"core-service": "management"}}'
{"outcome" : "success", "result" : []}
In DeploymentTest class, the script sends as below. Health check fails if a deployment failed to start and an error is returned from the query.
sh-4.2$ curl http://localhost:9990/management --header "Content-Type: application/json" -d '{"operation": "read-attribute", "address":{"deployment": "*"}, "name": "status"}'
{"outcome" : "success", "result" : [{"address" : [{ "deployment" : "test.war" }], "outcome" : "success", "result" : "OK"}]}
In jboss-eap-7/eap72-openshift or later, in addition to the above checks, the health check script sends an additional query as below. This query always return UP unless the application contains a custom health check based on MicroProfile Health spec, so this health check will always succeed by default.
sh-4.2$ curl http://localhost:9990/management --header "Content-Type: application/json" -d '{"operation": "check", "address": {"subsystem": "microprofile-health-smallrye"}}'
{"outcome" : "success", "result" : {"outcome" : "UP", "checks" : []}}
Root Cause
EAP 7 deployment can be done via Operator or template (EAP 8 only supports Helm Charts as Tech Preview).
The deployment will create deployment/deployment config yaml file, which sets the pod yamls - that will include the probes - which are scripts that call the python functions.
For template deployment on EAP 7.1 uses the following Content from github.com is not included.template definition.
"livenessProbe": {
"exec": {
"command": [
"/bin/bash",
"-c",
"/opt/eap/bin/livenessProbe.sh"
]
},
"initialDelaySeconds": 60
},
"readinessProbe": {
"exec": {
"command": [
"/bin/bash",
"-c",
"/opt/eap/bin/readinessProbe.sh"
]
}
},
The template set on the deployment/deployment config:
apiVersion: apps.openshift.io/v1
items:
- apiVersion: apps.openshift.io/v1
kind: DeploymentConfig
metadata:
annotations:
openshift.io/generated-by: OpenShiftNewApp
creationTimestamp: "2022-12-21T06:56:21Z"
generation: 1
labels:
app: eap74-basic-s2i
app.kubernetes.io/component: eap74-basic-s2i
app.kubernetes.io/instance: eap74-basic-s2i
application: eap-app
template: eap74-basic-s2i
xpaas: 7.4.0
...
spec:
containers:
...
livenessProbe:
exec:
command:
- /bin/bash
- -c
- /opt/eap/bin/livenessProbe.sh
failureThreshold: 3
initialDelaySeconds: 60
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
name: eap-app
ports:
- containerPort: 8778
name: jolokia
protocol: TCP
- containerPort: 8080
name: http
protocol: TCP
- containerPort: 8888
name: ping
protocol: TCP
readinessProbe:
exec:
command:
- /bin/bash
- -c
- /opt/eap/bin/readinessProbe.sh
failureThreshold: 3
initialDelaySeconds: 10
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
resources:
limits:
memory: 1Gi
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
terminationGracePeriodSeconds: 75
Example readiness and liveness
| Output type | liveness location/file | liveness location/file |
|---|---|---|
| OUTPUT | /tmp/liveness-output | /tmp/readiness-output |
| ERROR | /tmp/liveness-error | /tmp/readiness-error |
| LOG | /tmp/liveness-log | /tmp/readiness-log |
To see more logs on the livenessProbe.sh:
sh-4.4$ cat /opt/eap/bin/livenessProbe.sh
...
OUTPUT=/tmp/liveness-output
ERROR=/tmp/liveness-error
LOG=/tmp/liveness-log
Liveness Probe and Readiness Probe expected behavior vs actual behavior
Failure of the probes vs expected results:
- Failure of the liveness probe forces Kubelet to send an
SIGTERMand thenSIGKILL. - Failure of the readiness probe forces the pod to be isolated, so not receiving any network connection.
However, the failure of one or both probes at the start-up of EAP 7 will make the pod state not change from Failure (default) to Success, so it will prevent them from coming up. Therefore, although by definition the failure of the readiness probe doesn't cause a restart, the first time it is fired, in case of a failure, it will cause the pod's state remain in failure, which is the default state regardless, and the pod cannot come up. Therefore causing a SIGTERM. This explains problems where the readiness probe causes pod restarts without breaking its responsibility definition: the readiness probe controls the network usage not the live stage of a pod, which is the responsibility of the liveness probe.
Diagnostic Steps
- The probe scripts rely in a python commands that ping the Health Check subsystem (if installed).
To see more logs on thelivenessProbe.shandreadinessProbe.sh, enableSCRIPT_DEBUGenv variable on the deployment/deploymentconfig, which will located in/tmp/readiness-log.
Example:
2023-10-27 23:28:19,533 DEBUG [__main__] Starting probe runner with args: Namespace(check=[<Status.READY: 8>], debug=True, logfile='/tmp/readiness-log', loglevel='DEBUG', probes=['probe.eap.dmr.EapProbe', 'probe.eap.dmr.HealthCheckProbe'])
...
2023-10-27 23:28:19,594 INFO [probe.eap.dmr.EapProbe] Executing the following tests: [probe.eap.dmr.ServerStatusTest, probe.eap.dmr.ServerRunningModeTest, probe.eap.dmr.BootErrorsTest, probe.eap.dmr.DeploymentTest]
...
2023-10-27 23:40:31,574 INFO [urllib3.connectionpool] Starting new HTTP connection (1): localhost
2023-10-27 23:40:31,575 DEBUG [urllib3.connectionpool] "POST /management HTTP/1.1" 200 None
2023-10-27 23:40:31,576 DEBUG [probe.eap.dmr.EapProbe] Probe response: <Response [200]>
2023-10-27 23:40:31,576 INFO [probe.eap.dmr.EapProbe] Executing test probe.eap.dmr.ServerStatusTest
2023-10-27 23:40:31,576 DEBUG [probe.eap.dmr.EapProbe] Test input = {
"outcome": "success",
"result": "running"
}
- Verify the events:
oc get events -n namespace - Also verify the kubelet probe failures in the OCP node:
journalctl -u kubelet:
$ journalctl -u kubelet | grep probe
Apr 05 21:58:52 ip-10-0-1-7 kubenswrapper[2100]: I0405 21:58:52.854520 2100 prober.go:107] "Probe failed" probeType="Readiness" pod="eap-demo/eap-app-4-p9kmq" podUID=d5419cf7-2eaa-4984-a695-3c25be5c5d14 containerName="eap-app" probeResult=failure output=<
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.