How to set up a temporary monitoring system to check the health of the load balancer provisioned outside the cluster for external applications, as well as the ingress routers in RHOCP 4?

Solution Verified - Updated

Environment

  • OpenShift Container Platform 4

Issue

  • Sometimes requests to the application load-balancer go time out, how to check if the problem is related to the external load-balancer or OpenShift router pods?

Resolution

  • Create a new project called load-balancer-routers-monitoring:

    oc new-project load-balancer-routers-monitoring
    
  • Create a pod (its name will be curl-load-balancer) to monitor (by using curl) the *.apps Load Balancer:

    OAUTH_ENDPOINT=$(oc get infrastructure cluster -o json | jq '.status.apiServerURL | gsub("https://api.";"oauth-openshift.apps.") | gsub(":6443";"")' -r)
    
    oc run -n load-balancer-routers-monitoring curl-load-balancer -q --command=true --image=$(oc get po -n openshift-dns -l dns.operator.openshift.io/daemonset-dns=default -o jsonpath="{.items[0].spec.containers[?(@.name=='dns')].image}") -- /bin/sh -c "while true; do curl -sk --noproxy '*' --connect-timeout 1  -w \"local_ip: %{local_ip} remote_ip: %{remote_ip} response_code: %{response_code} time_connect: %{time_connect} time_total: %{time_total}\n\" -o /dev/null \"https://${OAUTH_ENDPOINT}/healthz\" || true; sleep 5; done"
    
  • Create a pod (its name will be curl-bypass-load-balancer) to monitor (by using curl) the router pods:

    OAUTH_ENDPOINT=$(oc get infrastructure cluster -o json | jq '.status.apiServerURL | gsub("https://api.";"oauth-openshift.apps.") | gsub(":6443";"")' -r)
    
    ROUTER_ADDRESSES=$(oc get po -n openshift-ingress -l ingresscontroller.operator.openshift.io/deployment-ingresscontroller=default -o json | jq '.items[].status.hostIP+"\\n"' -jr)
    
    oc run -n load-balancer-routers-monitoring curl-bypass-load-balancer -q --command=true --image=$(oc get po -n openshift-dns -l dns.operator.openshift.io/daemonset-dns=default -o jsonpath="{.items[0].spec.containers[?(@.name=='dns')].image}") -- /bin/sh -c "while true; do printf \"$ROUTER_ADDRESSES}\" | while read ROUTER_IP; do curl -sk --noproxy '*' --connect-timeout 1  -w \"local_ip: %{local_ip} remote_ip: %{remote_ip} response_code: %{response_code} time_connect: %{time_connect} time_total: %{time_total}\n\" -o /dev/null --resolve \"${OAUTH_ENDPOINT}:443:\$ROUTER_IP\" \"https://${OAUTH_ENDPOINT}/healthz\" || true; done; sleep 5;done"
    
  • Check the logs for both the pods:

    oc logs -n load-balancer-routers-monitoring --timestamps=true curl-load-balancer
    
    oc logs -n load-balancer-routers-monitoring --timestamps=true curl-bypass-load-balancer
    

Diagnostic Steps

To execute the same test, one time only, without the need to spin up pods into the cluster.

  • Save the needed variables:

    OAUTH_ENDPOINT=$(oc get infrastructure cluster -o json | jq '.status.apiServerURL | gsub("https://api.";"oauth-openshift.apps.") | gsub(":6443";"")' -r)
    
    ROUTER_ADDRESSES=$(oc get po -n openshift-ingress -l ingresscontroller.operator.openshift.io/deployment-ingresscontroller=default -o json | jq '.items[].status.hostIP+"\\n"' -jr)
    
  • Curl authentication route passing through Load Balancer:

    oc exec -n openshift-authentication-operator $(oc get po -n openshift-authentication-operator -l app=authentication-operator -o jsonpath='{.items[0].metadata.name}') -- sh -c "curl -sk --noproxy '*' --connect-timeout 1  -w \"local_ip: %{local_ip} remote_ip: %{remote_ip} response_code: %{response_code} time_connect: %{time_connect} time_total: %{time_total}\n\" -o /dev/null \"https://${OAUTH_ENDPOINT}/healthz\""
    
  • Curl authentication route bypassing the Load Balancer:

    oc exec -n openshift-authentication-operator $(oc get po -n openshift-authentication-operator -l app=authentication-operator -o jsonpath='{.items[0].metadata.name}') -- sh -c "printf \"$ROUTER_ADDRESSES}\" | while read ROUTER_IP; do curl -sk --noproxy '*' --connect-timeout 1  -w \"local_ip: %{local_ip} remote_ip: %{remote_ip} response_code: %{response_code} time_connect: %{time_connect} time_total: %{time_total}\n\" -o /dev/null --resolve \"${OAUTH_ENDPOINT}:443:\$ROUTER_IP\" \"https://${OAUTH_ENDPOINT}/healthz\" || true; done"
    
Components
Category

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.