[OVN] Multi-homed node - No Route From POD network to second NIC network
Environment
- Red Hat Openshift Container Platform (RHOCP)
- 4.8.z
- 4.9.z
- OVN-Kubernetes
- RHOCP nodes with multiple NIC configuration (Multi-homed node)
Issue
- After upgrading from 4.7.z to 4.8.z, RHOCP cluster doesn't provide the expected routing path from POD internal network to external networks through secondaries NICs on RHOCP nodes.
Resolution
To prevent the issue to happen during an upgrade from RHOCP 4.7.z to 4.8.z, and later from 4.8.z to 4.9.z, it must be created a configmap to force OVN Gateway mode to local:
-
Create the
gateway-mode-configconfigmap in the namespaceopenshift-network-operator:$ cat sharedGW.yml apiVersion: v1 kind: ConfigMap metadata: name: gateway-mode-config namespace: openshift-network-operator data: mode: "local" immutable: true $ oc apply -f sharedGW.yml configmap/gateway-mode-config created -
Upgrade to RHOCP 4.8.z.
-
Check the gateway mode config used by the ovn master PODs:
$ oc logs ovnkube-master-<ID> -c ovnkube-master |grep -i mode + gateway_mode_flags='--gateway-mode local --gateway-interface br-ex' + exec /usr/bin/ovnkube --init-master ip-10-0-138-37.ec2.internal --config-file=/run/ovnkube-config/ovnkube.conf --ovn-empty-lb-events --loglevel 4 --metrics-bind-address 127.0.0.1:29102 --metrics-enable-pprof --gateway-mode local --gateway-interface br-ex
Root Cause
Previously, with RHOCP 4.7.z, by default the OVN gateway mode was configured as local, which means that all traffic went via the host for routing. Due to this behavior, customers could allow PODs to reach external services from internal network to go to a non-default gateway.
Starting in RHOCP 4.8.z, OVN gateway mode is now configured by default as shared, and this behavior no longer works, because traffic egresses the node without going to the host for routing and only uses routes on in OVN.
Starting in RHOCP 4.10.z, this has been fixed by a Content from github.com is not included.new API called "routingViaHost", which will force all egress traffic to go via host network namespace first if enabled.
Diagnostic Steps
-
Using the following Three-Node OpenShift Compact Cluster configuration, it was added a second NIC to each node in the subnet 10.97.224.0/22
- master1:
- primary ip (MachineNetworkCIDR): 192.168.100.10/24
- secondary ip: 192.168.200.10/24
- master2:
- primary ip(MachineNetworkCIDR): 192.168.100.11/24
- secondary ip: 192.168.200.11/24
- master3:
- primary ip(MachineNetworkCIDR): 192.168.100.12/24
- secondary ip: 192.168.200.12/24
- master1:
-
POD IP 10.128.0.45 on master1:
$ oc get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES multitool-openshift-58d96959c4-6txzk 1/1 Running 0 16m 10.128.0.45 ip-10-0-128-253.ec2.internal <none> <none> -
The cluster has been recently upgraded to RHOCP 4.8.39:
$ oc get nodes -o wide NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME ip-10-0-128-253.ec2.internal Ready master,worker 7h v1.21.8+ed4d8fd 192.168.100.10 <none> Red Hat Enterprise Linux CoreOS 48.84.202204202010-0 (Ootpa) 4.18.0-305.45.1.el8_4.x86_64 cri-o://1.21.6-3.rhaos4.8.git19780ee.2.el8 ip-10-0-142-9.ec2.internal Ready master,worker 7h v1.21.8+ed4d8fd 192.168.100.11 <none> Red Hat Enterprise Linux CoreOS 48.84.202204202010-0 (Ootpa) 4.18.0-305.45.1.el8_4.x86_64 cri-o://1.21.6-3.rhaos4.8.git19780ee.2.el8 ip-10-0-143-199.ec2.internal Ready master,worker 7h v1.21.8+ed4d8fd 192.168.100.12 <none> Red Hat Enterprise Linux CoreOS 48.84.202204202010-0 (Ootpa) 4.18.0-305.45.1.el8_4.x86_64 cri-o://1.21.6-3.rhaos4.8.git19780ee.2.el8 $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.8.39 True False 3h35m Cluster version is 4.8.39 -
Steps to Reproduce:
- Test 1 - Not reachable - POD IP 10.130.0.57 on master1 can't ping to master3's second NIC 192.168.200.12:
$ ping -c 3 192.168.200.12 PING 192.168.200.12 (192.168.200.12) 56(84) bytes of data. --- 192.168.200.12 ping statistics --- 3 packets transmitted, 0 received, 100% packet loss, time 2035ms $ tracepath -n 192.168.200.12 1?: [LOCALHOST] pmtu 8901 1: 10.128.0.1 1.383ms asymm 2 1: 10.128.0.1 1.222ms asymm 2 2: 100.64.0.3 1.458ms asymm 3 3: no reply (..) 28: no reply 29: no reply 30: no reply Too many hops: pmtu 8901 Resume: pmtu 8901 $- Test 2 - OK - POD IP 10.128.0.45 on master1 ping to master1's second NIC 192.168.200.10:
$ ping -c 3 192.168.200.10 PING 192.168.200.10 (192.168.200.10) 56(84) bytes of data. 64 bytes from 192.168.200.10: icmp_seq=1 ttl=64 time=0.108 ms 64 bytes from 192.168.200.10: icmp_seq=2 ttl=64 time=0.077 ms 64 bytes from 192.168.200.10: icmp_seq=3 ttl=64 time=0.093 ms --- 192.168.200.10 ping statistics --- 3 packets transmitted, 3 received, 0% packet loss, time 2042ms rtt min/avg/max/mdev = 0.077/0.092/0.108/0.012 ms $ tracepath -n 192.168.200.10 1?: [LOCALHOST] pmtu 8901 1: 10.128.0.1 1.636ms asymm 2 1: 10.128.0.1 1.059ms asymm 2 2: 192.168.200.10 1.125ms reached Resume: pmtu 8901 hops 2 back 1 $ -
Actual results: POD IP 10.130.0.57 on master1 can't ping to master3's second NIC 192.168.200.12.
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.