Troubleshoot HTTP readinessProbe in Openshift web applications

Solution Verified - Updated 16 Mar 2017

Environment

Openshift Container Platform 3.x

Issue

A web application has configured a HTTP readinessProbe like the following:

"readinessProbe": {
    "httpGet": {
        "path": "/healtz",
        "port": 8080,
        "scheme": "HTTP"
     },
     "initialDelaySeconds": 10,
     "timeoutSeconds": 1,
     "periodSeconds": 3,
     "successThreshold": 1,
     "failureThreshold": 3
},

After the given retries the application does not become ready. But the logs show the application started successfully and should be ready.

Root Cause

Possible causes can be found:

The HTTP Probe is configured to a not implemented endpoint (returns 404)
The container is not exposing the port or the wrong port is exposed

Diagnostic Steps

It is very important to know that probes are a way to decide that the application is ready. For some applications might be enough that a specific port is open but for some others can be that some HTTP request replies with a non-error status. This is the reason why sometimes for web applications a "healthz" endpoint is implemented although the name of the path can be anything (e.g. /status/version).

There are different ways to test the availability of an application:

TCP Socket opened
Container execution checks
HTTP Check

This article focuses on troubleshooting HTTP checks as are the more common cases for web applications. In order to identify that the pod does not become ready due to a probe problem there are two indicators that should be checked:

$ oc get pods
NAME              READY     STATUS      RESTARTS   AGE
cakephp-1-build   0/1       Completed   0          3d
cakephp-2-4e6zw   0/1       Running     0          20m

The 0/1 of the Running pod means that it is running container-wise but for some reason it cannot be considered ready. The project events will yield the reason why the pod is not healthy.

$ oc get events
2017-02-13 16:13:50 +0530 IST   2017-02-13 16:13:50 +0530 IST   1         cakephp-2-4e6zw   Pod       spec.containers{registry}   Normal    Created   {kubelet shift32-ha-n1.gsslab.brq.redhat.com}   Created container with docker id 7de4a08a8393; Security:[seccomp=unconfined]
2017-02-13 16:13:51 +0530 IST   2017-02-13 16:13:51 +0530 IST   1         cakephp-2-4e6zw   Pod       spec.containers{registry}   Normal    Started   {kubelet shift32-ha-n1.gsslab.brq.redhat.com}   Started container with docker id 7de4a08a8393
2017-02-13 16:13:55 +0530 IST   2017-02-13 16:13:55 +0530 IST   1         cakephp-2-4e6zw   Pod       spec.containers{registry}   Warning   Unhealthy   {kubelet shift32-ha-n1.gsslab.brq.redhat.com}   Readiness probe failed: HTTP probe failed with statuscode: 404

It is more advisable to start from the inside of the container so accessing the container can be done like this:

$ oc rsh <pod-name>
$ curl -v http://localhost:8080/healthz
* Connected to localhost (127.0.0.1) port 8080 (#0)
> GET /healthz HTTP/1.1
> User-Agent: curl/7.29.0
> Host: localhost:8080
> Accept: */*
> 
< HTTP/1.1 404 Not Found
< Date: Mon, 13 Feb 2017 09:57:30 GMT
< Server: Apache/2.4.18 (Red Hat)
< Content-Length: 66859
< Content-Type: text/html; charset=UTF-8

In this snippet can be seen that the provided path does not exist. Therefore the probe which only accepts HTTP status from 200 to 399 will consider the application not ready.
To solve that, provide within your web application a path returning a successful status code no matter what the output is.

If the HTTP status was successful instead, the next step would be to try from the outside of the Pod. The IP of the pod can be seen using this command:

$ oc get pods <pod_name> -o wide
NAME              READY     STATUS      RESTARTS   AGE       IP             NODE
cakephp-1-18e0o   1/1       Running     2          3d        10.129.0.167   node.example.com

Or this command

$ oc get pods <pod_name> -o jsonpath='{.status.podIP}'
10.129.0.167

After that, a curl to that url can be attempted:

$ curl -v http://10.129.0.167:8080/healthz
* About to connect() to 10.129.0.167 port 8080 (#0)
*   Trying 10.129.0.167...
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0* Connection refused
* Failed connect to 10.129.0.167:8080; Connection refused
* Closing connection 0
curl: (7) Failed connect to 10.129.0.167:8080; Connection refused

If the connection is refused can be because the port is not accessible. Verify it is exposed:

$ oc get pod <pod-name> -o yaml
spec:
  containers:
  - image: 172.30.65.140:5000/test-php/cakephp@sha256:c4bc748b07a29cefb10ab15d68167680f531d1acc6963411ee53527ae0a1ddee
    imagePullPolicy: Always
    name: cakephp
    ports:
    - containerPort: 9090
      protocol: TCP

In the previous output the exposed port is 9090 instead of 8080, that can be the reason of the "connection refused" error. Fix the exposed port or if both are needed, expose a second port. This has to be fixed in the deployment config and will trigger a new deployment with the new configuraton, otherwise the change will only apply to the existing pod.

$ oc edit dc <dc-name>
    spec:
      containers:
      - image: 172.30.65.140:5000/test-php/cakephp@sha256:c4bc748b07a29cefb10ab15d68167680f531d1acc6963411ee53527ae0a1ddee
        imagePullPolicy: Always
        name: cakephp
        ports:
        - containerPort: 8080
          protocol: TCP
        - containerPort: 9090
          protocol: TCP

After that a new deployment will take place and the port will be exposed. Another possible problem can be that nothing is listening on this port and this can be identified by a "no route to host" error after few seconds trying to get a response from the server:

* About to connect() to 10.129.0.167 port 9090 (#0)
*   Trying 10.129.0.167...
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:--  0:00:18 --:--:--     0* No route to host
* Failed connect to 10.129.0.167:9090; No route to host
* Closing connection 0
curl: (7) Failed connect to 10.129.0.167:9090; No route to host

Probes are executed by kubernetes from the same node the pod is running on so if all the previous checks are successful the pod status should turn to ready.

SBR

Shift

Product(s)

Red Hat OpenShift Container Platform

Category

Troubleshoot

Tags

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.