[OVN] Multi-homed node - No Route From POD network to second NIC network

Solution Verified - Updated

Environment

  • Red Hat Openshift Container Platform (RHOCP)
    • 4.8.z
    • 4.9.z
  • OVN-Kubernetes
  • RHOCP nodes with multiple NIC configuration (Multi-homed node)

Issue

  • After upgrading from 4.7.z to 4.8.z, RHOCP cluster doesn't provide the expected routing path from POD internal network to external networks through secondaries NICs on RHOCP nodes.

Resolution

To prevent the issue to happen during an upgrade from RHOCP 4.7.z to 4.8.z, and later from 4.8.z to 4.9.z, it must be created a configmap to force OVN Gateway mode to local:

  • Create the gateway-mode-config configmap in the namespace openshift-network-operator:

      $ cat sharedGW.yml 
      apiVersion: v1
      kind: ConfigMap
      metadata:
          name: gateway-mode-config
          namespace: openshift-network-operator
      data:
          mode: "local"
      immutable: true
    
      $ oc apply -f sharedGW.yml 
      configmap/gateway-mode-config created
    
  • Upgrade to RHOCP 4.8.z.

  • Check the gateway mode config used by the ovn master PODs:

    $ oc logs ovnkube-master-<ID> -c ovnkube-master |grep -i mode
    + gateway_mode_flags='--gateway-mode local --gateway-interface br-ex'
    + exec /usr/bin/ovnkube --init-master ip-10-0-138-37.ec2.internal --config-file=/run/ovnkube-config/ovnkube.conf 
    --ovn-empty-lb-events --loglevel 4 --metrics-bind-address 127.0.0.1:29102 --metrics-enable-pprof 
    --gateway-mode local --gateway-interface br-ex 
    

Root Cause

Previously, with RHOCP 4.7.z, by default the OVN gateway mode was configured as local, which means that all traffic went via the host for routing. Due to this behavior, customers could allow PODs to reach external services from internal network to go to a non-default gateway.

Starting in RHOCP 4.8.z, OVN gateway mode is now configured by default as shared, and this behavior no longer works, because traffic egresses the node without going to the host for routing and only uses routes on in OVN.

Starting in RHOCP 4.10.z, this has been fixed by a Content from github.com is not included.new API called "routingViaHost", which will force all egress traffic to go via host network namespace first if enabled.

Diagnostic Steps

  • Using the following Three-Node OpenShift Compact Cluster configuration, it was added a second NIC to each node in the subnet 10.97.224.0/22

    • master1:
      • primary ip (MachineNetworkCIDR): 192.168.100.10/24
      • secondary ip: 192.168.200.10/24
    • master2:
      • primary ip(MachineNetworkCIDR): 192.168.100.11/24
      • secondary ip: 192.168.200.11/24
    • master3:
      • primary ip(MachineNetworkCIDR): 192.168.100.12/24
      • secondary ip: 192.168.200.12/24
  • POD IP 10.128.0.45 on master1:

    $ oc get pods -o wide
    NAME                                   READY   STATUS    RESTARTS   AGE   IP            NODE                           NOMINATED NODE   READINESS GATES
    multitool-openshift-58d96959c4-6txzk   1/1     Running   0          16m   10.128.0.45   ip-10-0-128-253.ec2.internal   <none>           <none>
    
  • The cluster has been recently upgraded to RHOCP 4.8.39:

      $ oc get nodes -o wide
      NAME                           STATUS   ROLES           AGE   VERSION           INTERNAL-IP    EXTERNAL-IP   OS-IMAGE                                                       KERNEL-VERSION                 CONTAINER-RUNTIME
      ip-10-0-128-253.ec2.internal   Ready    master,worker   7h    v1.21.8+ed4d8fd   192.168.100.10   <none>        Red Hat Enterprise Linux CoreOS 48.84.202204202010-0 (Ootpa)   4.18.0-305.45.1.el8_4.x86_64   cri-o://1.21.6-3.rhaos4.8.git19780ee.2.el8
      ip-10-0-142-9.ec2.internal     Ready    master,worker   7h    v1.21.8+ed4d8fd   192.168.100.11     <none>        Red Hat Enterprise Linux CoreOS 48.84.202204202010-0 (Ootpa)   4.18.0-305.45.1.el8_4.x86_64   cri-o://1.21.6-3.rhaos4.8.git19780ee.2.el8
      ip-10-0-143-199.ec2.internal   Ready    master,worker   7h    v1.21.8+ed4d8fd   192.168.100.12   <none>        Red Hat Enterprise Linux CoreOS 48.84.202204202010-0 (Ootpa)   4.18.0-305.45.1.el8_4.x86_64   cri-o://1.21.6-3.rhaos4.8.git19780ee.2.el8
    
      $ oc get clusterversion
      NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
      version   4.8.39    True        False         3h35m   Cluster version is 4.8.39
    
  • Steps to Reproduce:

    • Test 1 - Not reachable - POD IP 10.130.0.57 on master1 can't ping to master3's second NIC 192.168.200.12:
      $ ping -c 3 192.168.200.12
      PING 192.168.200.12 (192.168.200.12) 56(84) bytes of data.
    
      --- 192.168.200.12 ping statistics ---
      3 packets transmitted, 0 received, 100% packet loss, time 2035ms
    
      $ tracepath -n 192.168.200.12
       1?: [LOCALHOST]                      pmtu 8901
       1:  10.128.0.1                                            1.383ms asymm  2 
       1:  10.128.0.1                                            1.222ms asymm  2 
       2:  100.64.0.3                                            1.458ms asymm  3 
       3:  no reply
      (..)
    
      28:  no reply
      29:  no reply
      30:  no reply
           Too many hops: pmtu 8901
           Resume: pmtu 8901 
      $
    
    • Test 2 - OK - POD IP 10.128.0.45 on master1 ping to master1's second NIC 192.168.200.10:
      $ ping -c 3 192.168.200.10
      PING 192.168.200.10 (192.168.200.10) 56(84) bytes of data.
      64 bytes from 192.168.200.10: icmp_seq=1 ttl=64 time=0.108 ms
      64 bytes from 192.168.200.10: icmp_seq=2 ttl=64 time=0.077 ms
      64 bytes from 192.168.200.10: icmp_seq=3 ttl=64 time=0.093 ms
    
      --- 192.168.200.10 ping statistics ---
      3 packets transmitted, 3 received, 0% packet loss, time 2042ms
      rtt min/avg/max/mdev = 0.077/0.092/0.108/0.012 ms
    
      $ tracepath -n 192.168.200.10
       1?: [LOCALHOST]                      pmtu 8901
       1:  10.128.0.1                                            1.636ms asymm  2 
       1:  10.128.0.1                                            1.059ms asymm  2 
       2:  192.168.200.10                                         1.125ms reached
           Resume: pmtu 8901 hops 2 back 1 
      $ 
    
  • Actual results: POD IP 10.130.0.57 on master1 can't ping to master3's second NIC 192.168.200.12.

Components
Category

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.