After upgrade access to ExternalIP with Openshift OVN-Kubernetes stopped working

Solution Verified - Updated

Environment

Red Hat OpenShift Container Platform 4.8.34+, 4.9.23+ and 4.10.3+

Issue

After upgrading Red Hat OpenShift Container Platform (RHOCP), with OVN-Kubernetes, the ingress access to services via ExternalIP stopped working and result in "No Route to Host".

Resolution

If using "native" access to ExternalIP services in RHOCP with OVN-Kubernetes and after upgrading to 4.8.34 and above, 4.9.23 and above or 4.10.3 and above, the access stop work with issues like "No Route to Host" errors or connection time outs, there is a need to migrate these services' ExternalIPs to be managed by Ipfailover or MetalLB (in case of 4.9 and 4.10) or create the necessary routes on the infrastructure in order for the traffic to be able to reach the respective ExternalIPs defined on the services, as explained This page is not included, but the link has been rewritten to point to the nearest parent document.here.

In case there is a plan to upgrade RHOCP to the releases mentioned, migrate these services prior to the cluster upgrade and avoid any disruption in your users' services.

For example, using a scenario where multiple services in one project have ExternalIPs configured, a group of ipfailover replicas can be configured to expose those IPs:

  $ cat ipfailover-deploy-example.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: ipfailover-group1
  labels:
    ipfailover: group1
spec:
  strategy:
    type: Recreate
  replicas: 2
  selector:
    matchLabels:
      ipfailover: group1
  template:
    metadata:
      labels:
        ipfailover: group1
    spec:
      serviceAccountName: ipfailover
      privileged: true
      hostNetwork: true
      containers:
      - name: openshift-ipfailover
        image: quay.io/openshift/origin-keepalived-ipfailover:<ocp-release>
        ports:
        - containerPort: 63000  --> when using multiple ipfailover deployments it is recommended to set different container and host ports to avoid conflicts. 
          hostPort: 63000
        imagePullPolicy: IfNotPresent
        securityContext:
          privileged: true
        volumeMounts:
        - name: lib-modules
          mountPath: /lib/modules
          readOnly: true
        - name: host-slash
          mountPath: /host
          readOnly: true
          mountPropagation: HostToContainer
        - name: etc-sysconfig
          mountPath: /etc/sysconfig
          readOnly: true
        - name: config-volume
          mountPath: /etc/keepalive
        env:
        - name: OPENSHIFT_HA_CONFIG_NAME
          value: "ipfailover-group1"
        - name: OPENSHIFT_HA_VIRTUAL_IPS
          value: "<ExternalIP_1>,<External_2>,<ExternalIP_3>" --> The ExternalIPs set on the services.
        - name: OPENSHIFT_HA_VIP_GROUPS
          value: "1"
        - name: OPENSHIFT_HA_NETWORK_INTERFACE
          value: "br-ex"     ---> On OVN this is the external interface of the node
        - name: OPENSHIFT_HA_MONITOR_PORT
          value: "0"          ---> When having multiple services managed by the same ipfailover group this needs to be set to 0. Otherwise one can just use the service port.
        - name: OPENSHIFT_HA_VRRP_ID_OFFSET
          value: "1"
        - name: OPENSHIFT_HA_REPLICA_COUNT
          value: "2"
        - name: OPENSHIFT_HA_USE_UNICAST
          value: "false"
        - name: OPENSHIFT_HA_IPTABLES_CHAIN
          value: "INPUT"
        - name: OPENSHIFT_HA_CHECK_SCRIPT
          value: "/etc/keepalive/mycheckscript.sh"
        - name: OPENSHIFT_HA_PREEMPTION
          value: "preempt_delay 300"
        - name: OPENSHIFT_HA_CHECK_INTERVAL
          value: "2"

Once the Ipfailover pods start the ExternalIPs can be seen on the nodes network configuration and access to the service will resume. To confirm this check on the nodes where ipfailover pods have been schedule:

  # ip -d -c addr show
or
  # nmcli -p dev show br-ex

Root Cause

There has been a change in the code of OVN-Kubernetes to avoid OVN services to answer ARPrequests for the LoadBalancer/ExternalIPs services and avoid conflicts with other services that perform the same task, like for example Ipfailover and MetalLB speaker pods. More information can be seen Content from github.com is not included.here.

Currently an update on the official documentation has been request on This content is not included.bugzilla ticket 2076662, in order to create a warning about this before an upgrade is done.

Diagnostic Steps

After upgrade if issues are noticed on the access to ExternalIPs, create a new project and a simple example application from the RHOCP catalog, like for example django+postgresql that creates 2 services which can be tested for http and tcp access. Once the template is deployed patch the services to add an ExternalIP to test connectivity:

  $ oc get pods,services
NAME                              READY   STATUS    RESTARTS   AGE
pod/django-psql-example-1-tm9wc   1/1     Running   0          3m55s
pod/postgresql-1-6ptd9            1/1     Running   0          6m44s

NAME                          TYPE        CLUSTER-IP       EXTERNAL-IP     PORT(S)    AGE
service/django-psql-example   ClusterIP   172.46.166.200   172.23.188.12   8080/TCP   6m59s
service/postgresql            ClusterIP   172.46.176.75    172.23.188.18   5432/TCP   6m58s

  $ curl -v -D - http://172.23.188.12:8080/
*   Trying 172.23.188.12:8080...
* TCP_NODELAY set
* connect to 172.23.188.12 port 8080 failed: No route to host
* Failed to connect to 172.23.188.12 port 8080: No route to host
* Closing connection 0
curl: (7) Failed to connect to 172.23.188.12 port 8080: No route to host
  $ ping 172.23.188.12
PING 172.23.188.12 (172.23.188.12) 56(84) bytes of data.
From 172.23.188.1 icmp_seq=1 Destination Host Unreachable
From 172.23.188.1 icmp_seq=2 Destination Host Unreachable
From 172.23.188.1 icmp_seq=3 Destination Host Unreachable

 $ psql -h 172.23.188.18 -p 5432 -U django -d default
psql: error: could not connect to server: No route to host
        Is the server running on host "172.23.188.18" and accepting
        TCP/IP connections on port 5432?

 $ ping 172.23.188.18
PING 172.23.188.18 (172.23.188.18) 56(84) bytes of data.
From 172.23.188.1 icmp_seq=1 Destination Host Unreachable
From 172.23.188.1 icmp_seq=2 Destination Host Unreachable
From 172.23.188.1 icmp_seq=3 Destination Host Unreachable

However the VIPs are still configured on the OVN NorthBound database as expected:

  $ oc project openshift-ovn-kubernetes
  $ oc exec -c northd <some-ovnkube-master-pod> -- ovn-nbctl --no-leader-only lb-list

                                                            tcp        172.23.188.12:8080      10.220.4.30:8080
                                                            tcp        172.23.188.18:5432      10.223.0.32:5432
Components
Category

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.