How to set up a temporary monitoring system to check the health of the load balancer provisioned outside the cluster for external applications, as well as the ingress routers in RHOCP 4?
Environment
- OpenShift Container Platform 4
Issue
- Sometimes requests to the application load-balancer go time out, how to check if the problem is related to the external load-balancer or OpenShift router pods?
Resolution
-
Create a new project called
load-balancer-routers-monitoring:oc new-project load-balancer-routers-monitoring -
Create a pod (its name will be
curl-load-balancer) to monitor (by usingcurl) the*.appsLoad Balancer:OAUTH_ENDPOINT=$(oc get infrastructure cluster -o json | jq '.status.apiServerURL | gsub("https://api.";"oauth-openshift.apps.") | gsub(":6443";"")' -r)oc run -n load-balancer-routers-monitoring curl-load-balancer -q --command=true --image=$(oc get po -n openshift-dns -l dns.operator.openshift.io/daemonset-dns=default -o jsonpath="{.items[0].spec.containers[?(@.name=='dns')].image}") -- /bin/sh -c "while true; do curl -sk --noproxy '*' --connect-timeout 1 -w \"local_ip: %{local_ip} remote_ip: %{remote_ip} response_code: %{response_code} time_connect: %{time_connect} time_total: %{time_total}\n\" -o /dev/null \"https://${OAUTH_ENDPOINT}/healthz\" || true; sleep 5; done" -
Create a pod (its name will be
curl-bypass-load-balancer) to monitor (by usingcurl) the router pods:OAUTH_ENDPOINT=$(oc get infrastructure cluster -o json | jq '.status.apiServerURL | gsub("https://api.";"oauth-openshift.apps.") | gsub(":6443";"")' -r)ROUTER_ADDRESSES=$(oc get po -n openshift-ingress -l ingresscontroller.operator.openshift.io/deployment-ingresscontroller=default -o json | jq '.items[].status.hostIP+"\\n"' -jr)oc run -n load-balancer-routers-monitoring curl-bypass-load-balancer -q --command=true --image=$(oc get po -n openshift-dns -l dns.operator.openshift.io/daemonset-dns=default -o jsonpath="{.items[0].spec.containers[?(@.name=='dns')].image}") -- /bin/sh -c "while true; do printf \"$ROUTER_ADDRESSES}\" | while read ROUTER_IP; do curl -sk --noproxy '*' --connect-timeout 1 -w \"local_ip: %{local_ip} remote_ip: %{remote_ip} response_code: %{response_code} time_connect: %{time_connect} time_total: %{time_total}\n\" -o /dev/null --resolve \"${OAUTH_ENDPOINT}:443:\$ROUTER_IP\" \"https://${OAUTH_ENDPOINT}/healthz\" || true; done; sleep 5;done" -
Check the logs for both the pods:
oc logs -n load-balancer-routers-monitoring --timestamps=true curl-load-balanceroc logs -n load-balancer-routers-monitoring --timestamps=true curl-bypass-load-balancer
Diagnostic Steps
To execute the same test, one time only, without the need to spin up pods into the cluster.
-
Save the needed variables:
OAUTH_ENDPOINT=$(oc get infrastructure cluster -o json | jq '.status.apiServerURL | gsub("https://api.";"oauth-openshift.apps.") | gsub(":6443";"")' -r)ROUTER_ADDRESSES=$(oc get po -n openshift-ingress -l ingresscontroller.operator.openshift.io/deployment-ingresscontroller=default -o json | jq '.items[].status.hostIP+"\\n"' -jr) -
Curl authentication route passing through Load Balancer:
oc exec -n openshift-authentication-operator $(oc get po -n openshift-authentication-operator -l app=authentication-operator -o jsonpath='{.items[0].metadata.name}') -- sh -c "curl -sk --noproxy '*' --connect-timeout 1 -w \"local_ip: %{local_ip} remote_ip: %{remote_ip} response_code: %{response_code} time_connect: %{time_connect} time_total: %{time_total}\n\" -o /dev/null \"https://${OAUTH_ENDPOINT}/healthz\"" -
Curl authentication route bypassing the Load Balancer:
oc exec -n openshift-authentication-operator $(oc get po -n openshift-authentication-operator -l app=authentication-operator -o jsonpath='{.items[0].metadata.name}') -- sh -c "printf \"$ROUTER_ADDRESSES}\" | while read ROUTER_IP; do curl -sk --noproxy '*' --connect-timeout 1 -w \"local_ip: %{local_ip} remote_ip: %{remote_ip} response_code: %{response_code} time_connect: %{time_connect} time_total: %{time_total}\n\" -o /dev/null --resolve \"${OAUTH_ENDPOINT}:443:\$ROUTER_IP\" \"https://${OAUTH_ENDPOINT}/healthz\" || true; done"
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.